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CONTRIBUTIONS TO THE THEORY OF STATISTICAL ESTIMATION 
AND TESTING HYPOTHESES’ 


By ABRAHAM WALD 


1. Introduction. Let us consider a family of systems of n variates X,(6”, 

., 0), .-., X,(0, --- , 0) depending on k parameters 6, .-- , 0. 
A system of k values 6°, -.- , 0 can be represented in the k-dimensional 
parameter space by the point @ with the co-ordinates g,..., 0. Denote 
by & the set of all possible points 6. For any point 6 of 2 we shall denote by 
P(E «w|6) the probability that the sample point E = (21, --- , %n) falls into 
the region w of the n-dimensional sample space, where x; denotes the observed 
value of the variate X;(0)(j = 1,---, mn). The distribution P(E ew)@) is 
supposed to be known for any point @ of 2. In the theory of testing hypotheses 
and of statistical estimation we have to deal with problems of the following type: 
A sample point E = (2, --- ,2,) of the n-dimensional sample space is given. 
We know that 2; is the observed value of X ;(@) but we do not know the param- 
eter point 6, and we have to draw inferences about @ by means of the sample 
point observed. The assumption that @ belongs to a certain subset w of Q is 
called a hypothesis. We shall deal in this paper with the following general prob- 
lem: Let us consider a system S of subsets of 2. Denote by H, the hypoth- 
esis corresponding to the element w of S, and by Hs the system of all hypotheses 
corresponding to all elements of S. We have to decide by means of the observed 
sample point E which hypothesis of the system Hs should be accepted. That is 
to say for each H,, we have to determine a region of acceptance M, in the n- 
dimensional sample space. The hypothesis H,, will be accepted if and only if 
the sample point E falls in the region M,. M, and M,, are disjoint if w ¥ w’. 
The statistical problem is the question as to how the system Ms of all regions 
M, should be chosen. 

The problem in this formulation is very general. It contains the problems of 
testing hypotheses and of statistical estimation treated in the literature.” For 
instance if we want to test the hypothesis H., corresponding to a certain subset w 
of 2, the system of hypotheses Hs consists only of the two hypotheses H, and 
Hz where @ denotes the subset of 2 complementary tow. If we want to estimate 
8 by a unique point, then S is the system of all points of 2. In the theory of 
confidence intervals we estimate one of the parameter co-ordinates 0, --- , 6, 


1 Research under a grant-in-aid from the Carnegie Corporation of New York. 

2 See, for instance, J. Neyman, ‘‘Outline of a Theory of Statistical Estimation Based on 
the Classical Theory of Probability,’’ Phil. Transactions of the Royal Society, London, 
Vol. 231 (1937), pp. 333-380. 
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say 6°”, by an interval. In this case S is a certain system of subsets w of the 
following type: w is the set of all points @ = (6, --- , 0) for which 6 lies 
in a certain interval [a, b]. The problem in our formulation covers also cases 
which, as far as I know, have not yet been treated. Consider for instance 3 
subsets w; , we and w; of 2 such that the sum of them is equal to 2. It may be 
that we are interested only to know in which of the subsets a , we , w3 the un- 
known parameter point lies. In this case the system of hypotheses Hs consists 
only of the 3 hypotheses H,, , H., and H.,. Cases like this might be of practical 
interest. 

For the determination of the “best” system (in a certain sense) of regions of 
acceptance we shall use methods and principles which are closely related to those 
of the Neyman-Pearson theory of testing hypotheses. In the Neyman-Pearson 
theory two types of error are considered. Let @ = 6 be the hypothesis to be 
tested, where 6; denotes a certain point of the parameter space. Denote this 
hypothesis by H, and the hypothesis @ ¥ 6, by H. The type I error is that which 
is made by rejecting H; when it is true. The type II error is made by accepting 
H, when it is false. The fundamental principle in the Neyman-Pearson theory 
can be formulated as follows: among all critical regions (regions of rejection of 
H,, i.e. regions of acceptance of H) for which the probability of type I error is 
equal to a given constant a, we have to choose that region for which the proba- 
bility of type II error isa minimum. The difficulty which arises here lies in the 
circumstance that the probability of type II error depends on the true parameter 
point @. That is to say, if the critical region is given the probability of type II 
error will be a function of the true parameter point @. Since we do not know the 
true parameter point 6, we want to have a critical region which minimizes the 
probability of type II error with respect to any possible alternative hypothesis 
6 = 6. ~ 6. If such a common best critical region exists, then the problem is 
solved. But such cases are rather exceptional. If a common best critical 
region does not exist, Neyman and Pearson consider unbiased critical regions of 
different types, which minimize the type II error locally, that is to say with 
respect to alternative hypotheses in the neighborhood of the hypothesis con- 
sidered. In this paper we develop methods for the determination of a system of 
regions of acceptance taking in account type II errors also relative to alternative 
hypotheses not lying in the neighborhood of the hypothesis to be tested. 


2. Some Definitions. Let us denote by © the set of all possible parameter 
points @ and by S a system of subsets of 2. If p denotes the sum of the elements 
of a subset o of S, then we shall denote 2 M,, by M,, where M, denotes the 


3 J. Neyman and E. S. Pearson: Statistical Research Memoirs, Volumes I and II. The 
authors consider also unbiased regions of type A: for which the probability of type II 
error with respect to every alternative hypothesis is not greater than for any other unbiased 
region of the same size. However regions of type A; do not always exist (the existence of 
such regions has been proved for a special but important class of cases). 
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region of acceptance of H, and the summation is to be taken over all elements 
w of o. 

Definition 1. Denote by Ms and Msg two different systems of regions of 
acceptance corresponding to the same system Hs of hypotheses. The systems 
Ms and Mj are said to be equivalent if for each point @ of Q and for every p 
which is a sum of elements of S which does not contain 6, the equation 


P(E «M,| @) = P(E eM, | 8) 


holds, where M; denotes the region according to the system M; and M, denotes 
the region according to the system Ms. 

Definition 2. Denote by Ms and Ms two different systems of regions of 
acceptance corresponding to the same system of hypotheses. The system Mx 
is said to be absolutely better than the system Ms; if they are not equivalent and 


if for each 6 and for every p which is a sum of elements of S which does not 
contain 6 the inequality 


P(E «M)\ 0) < P(E eM, | 6) 
holds. 


Definition 3. A system Ms of regions of acceptance is said to be admissible 
if no absolutely better system of regions exists. 


3. The problem of the choice of M;. The choice of Ms will in general be 
affected by the following two circumstances: 

(1) We do not attribute the same importance to each error. For instance 
the acceptance of the hypothesis that @ lies in a certain interval J has in general 
more serious consequences if @ is far from IJ than if @is near toZ. The choice of 
Ms will in general depend on the relative importance of the different possible 
errors. 

(2) In some cases we have a priori more confidence that the true parameter 
point lies in a certain interval J than in some other cases. The choice of Ms 
will in general be affected also by this fact. Let us illustrate this by an example. 
We have two coins, a new and an old one and we want to test for both coins 
whether the probability p of tossing head is equal to 3. Let us assume that we 
make 100 tosses with each of the coins and we get head 40 times in each case. 
Since we have a priori no very great confidence that the old coin is unbiased, 
the fact that head occured only 40 times will suffice to reject the hypothesis that 
for the old coin p = 3. But in the case of the new coin, having much greater a 
priori confidence that it is unbiased, we shall perhaps not reject the hypothesis 
p = 4 and we shall rather assume that a somewhat improbable event occurred. 
That is to say, we do not choose the same critical region in both cases due to the 
fact that our a priori confidence for p = 3 is in the case of the new coin greater 
than in the case of the old one. 


In order to study the dependence of the choice of Mss on the two circumstances 
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mentioned, let us introduce a weight function for the possible errors and ana 
priori probability distribution for the unknown parameter 6. The weight 
function W(@, w) is a real valued non-negative function defined for all points 6 of 
Q and for all elements w of S, which expresses the relative importance of the 
error committed by accepting H., when @ is true. If @ is contained in w, W(6, w) 
is, of course, equal to zero. The question as to how the form of the weight 
function W(6, w) should be determined, is not a mathematical or statistical one. 
The statistician who wants to test certain hypotheses must first determine the 
relative importance of all possible errors which will entirely depend on the 
special purposes of his investigation. If that is done, we shall in general be 
able to give a more satisfactory answer to the question as to how the system of 
regions of acceptance should be chosen. In many cases, especially in statistical 
questions concerning industrial production, we are able to express the importance 
of an error in terms of money, that is to say, we can express the loss caused by the 
error considered in terms of money. We shall also say that W(@, w) is the loss 
caused by accepting H., when @ is true. 

The situation regarding the introduction of an a priori probability distribution 
of @is entirely different. First, the objection can be made against it, as Neyman 
has pointed out, that @ is merely an unknown constant and not a variate, hence 
it makes no sense to speak of the probability distribution of 6. Second, even if 
we may assume that @ is a variate, we have in general no possibility of determin- 
ing the distribution of 6 and any assumptions regarding this distribution are of 
hypothetical character. On account of these facts the determination of the 
system of regions of acceptance should be independent of any a priori probability 
considerations. The “best”? system of regions of acceptance, which we shall 
define later, will depend only on the weight function of the errors. The reason 
why we introduce here a hypothetical probability distribution of 6 is simply 
that it proves to be useful in deducing certain theorems and in the calculation 
of the best system of regions of acceptance. 

Let us denote by f(@) a distribution function of 6. For the sake of simplicity 
let us assume that the probability density of the distribution P(E e w | 6) exists 
in any point # of the sample space for any @ and denote it by p(E | 6). The 
expected value of the loss is given by 


(1) t= | | We, ox)pE|o apo) an 


where wz denotes the element of S corresponding to E (that is to say, wz is that 
element of S for which E is a point of the region of acceptance M,,,), and the 
integral is to be taken over the product of the sample space M with the param- 
eter space 2. The expected value I of the loss depends on the system Mss of 
regions of acceptance. The system Ms for which I becomes a minimum, can be 
regarded as the best system of regions relative to the given weight function and 
to the given a priori distribution of 6. 

One can easily show the following: If Ms is an absolutely better system of 
regions (in sense of the definition 2) than the system Ms, then for any weight 
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function w(@, w) and for any a priori distribution f(@) the expected value I’ of 
the loss corresponding to M4 is less than the expected value I of the loss cor- 
responding to Ms. (For some exceptional weight and a priori distribution 
functions I’ may be equal to J.) 

Hence we can give the following rule: We have to choose an admissible system of 
regions of acceptance. 

Now let us consider the question whether besides admissibility further restric- 
tions upon the choice of Ms can be made. In order to see this, let us consider 
two admissible systems of regions Ms and M; which are not equivalent. One 
can easily show that there exist two weight functions W,(@, w), W2(0,w) and twoa 
priori distributions f:(@) and fo(@) such that for W,(@,w) and f,(@) the expected 
value of the loss corresponding to Mz is less than that corresponding to Mg, and 
for W2(8, w) and f2(@) the expected value of the loss corresponding to Mg is greater 
than that corresponding to Ms. Hence no absolute criteria can be given as to 
which of the systems Ms and Mg should be chosen. In order to be able to make 
further restrictions upon the choice of Ms, we have to make assumptions regard- 
ing the form of the weight function. We shall deal with this question in section 6. 


4. Calculation of admissible systems of regions. As we have seen, we have to 
choose an admissible system of regions. The question arises as to how we can 
find admissible systems of regions. 

Provided that p(E | @) is continuous in E and @ jointly, one can easily show 
that M; is an admissible system of regions if there exists a bounded, uniformly 
continuous and everywhere positive (except if @ is contained in w) weight func- 
tion W(6, w) and an a priori distribution f(@) such that every open subset of Q 
has a positive probability and the expected value of the loss 


(2) uM) = [ [ WO, on)p(E |6) af(o) aE. 


becomes a minimum for Ms = Ms. (wg denotes that element of S for which 
M,, contains E). In fact if there existed an absolutely better system My of 
regions, then I(M%) would be less than J(M5) in contradiction to our assumption 
that J(Ms) becomes a minimum for Ms = Mg. 

In order to obtain an admissible system Ms we may choose any bounded, 
uniformly continuous and everywhere positive (except if @ is contained in w) 
weight function W(@, w) ard any arbitrary a priori distribution f(@) (subject 
only to the condition that every open subset of 2 should have a positive proba- 
bility) and then the system Ms which makes 


uM.) =f [ W(0, we)p(E |0) af(@) dE 


a minimum is an admissible one. In order to determine Ms we have only to 
determine for each E the corresponding element wz of S. Let us consider the 
integral 


In = [ WO, «)p(E|6) df). 
2 
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The integral J is for a fixed E only a function of w. It is obvious that wy, must 
be that element of S for which IZ, becomes a minimum. 


5. Admissible systems M, and the Neyman-Pearson best critical regions. 
Let us consider the case that the system Hs of hypotheses consists only of the 
following two hypotheses: 1) @ = 6) where 4 is a certain point of Q. 2) 6 
belongs to the set complementary to %. Let us denote by « the set consisting 
only of the point 4% , and by we the set complementary to w;. S consists in this 
case only of two elements w; and w.. The system Ms of regions consists of two 
regions of acceptance M.,, and M,,, corresponding to the hypotheses H.,, and H,,. 
If a common best critical region in the sense of Neyman-Pearson exists and if 
Mss is admissible, then M,,, is obviously a common best critical region. This 
leads to the following remarkable conclusion: If a common best critical region 
exists and if the system Ms of regions consisting of the two regions M,, and 
M.,, minimizes the expectation of the loss (formula 2) for a weight function and 
for an a priori distribution subject to some weak conditions mentioned in vara- 
graph 4, then M,, is a common best critical region. That is to say, the form 
of the weight function and of the a priori distribution affects only the size of the 
region M,,, but it will always be a common best critical region. 


6. The choice of M ; if a weight function is given. Weshall now consider the 
case in which a weight function W (8, w) is given and we shall deal with the ques- 
tion as to how Msg in this case is to be chosen. 

If the parameter point is an unknown constant and if @ denotes the true 
parameter point, then the expected value of the loss is given by 


(3) 1 = | WO, ws)p(E|@ de 


where the integration is to be taken over the whole sample space M and H,, 
denotes the hypothesis accepted if E is the observed sample point. That is 
to say wz is that element of S for which E£ is contained in the region of acceptance 
M.,. We shall call the expression (3) the risk of accepting a false hypothesis 
if @ is the true parameter point. Since we do not know the true parameter 
point 6, we shall have to study the risk r(@) as a function of @. We shall call this 
function the risk function. The form of the risk function depends on the 
system Ms of regions and on the form of the weight function. In order to 
express this fact, we shall denote the risk function corresponding to the system 
Ms and to the weight function W(@, w) also by 


[| Ms, W(8, w)). 


Definition 4. Denote by Ms and Msg two systems of regions of acceptance 
corresponding to the same system Hs of hypotheses. We shall say that Ms 
and Mj are equivalent relative to the weight function W (6, w) if the risk function 
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r[a| Ms, W(@, w)] is identically equal to the risk function r[6 | M;, W(8, w)), 
that is to say if for each point 6, 


r[o| Ms, W(8,w)] = r[@| Ms, W(6, w)). 


Definition 5. Denote by Ms and My, two systems of regions corresponding 
to the same system Hs of hypotheses. We shall say that Ms is uniformly 
better than M; relative to the weight function W(@, w) if Ms and Mj are not 
equivalent and for each 6 


























r(@|Ms, W(6,)] < r[6| Ms, W(6, w)). 


Definition 6. A system Ms of regions of acceptance is said to be admissible 
relative to the weight function W(@, w) if no uniformly better system of regions 
exists relative to the weight function considered. 

It is obvious that we have to choose a system M gs of regions which is admissible 
relative to the weight function considered. 

There exist in general many systems Ms which are admissible relative to the 
weight function given. The question arises as to how can we distinguish among 
them. Denote by ru, the maximum of the risk function corresponding to the 
system M of regions and to the given weight function. If we do not take into 
consideration a priori probabilities of 6, then it seems reasonable to choose that 
system Ms for which ry, becomes a minimum. We shall see in section 8 that 
the system Ms for which ry, becomes a minimum has some important properties 
wkich justify the distinction of this particular system of regions among all 
admissible systems. 

Definition 7. We shall call an admissible system M; of regions for which 
ry, becomes a minimum a best system of regions of acceptance relative to the 
weight function given.’ 

Now we shall have to deal with the question of determining a best system M x 
of regions and what special properties this system Mx has. 

7. Reduction of the problem to the case when the system //,; of hypotheses is 
the system of all simple hypotheses. A hypothesis H,, is said to be a simple 
hypothesis if w contains exactly one point of the parameter space 2. We assume 
that each element w of S is a closed subset of 2. Hence the power of S is not 
greater than the power of the continuum and therefore we can always set up a 
correspondence between the elements w of S and the points 6 of Q such that to 
each point 6 corresponds a certain element ws of S and to each element w of S 
at least one point @ exists for which we = w. For instance if S consists of the 
two elements w; and w. then we can set up a correspondence as follows: the 
element we of S corresponding to @ is a if 6 is contained in w, and we otherwise. 





* As we shall see later (Theorem 3), the best system of regions is uniquely determined if 
some regularity conditions are fulfilled. 
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If 2 is one dimensional and S is the system of all intervals of a certain length ¢ 
then we can define the interval ws corresponding to @ as the interval of which 
the initial point is 6 and the terminal point 6 + e. 

Let us denote the weight function by W(@, w) defined for all values of @ and 
for all elements w of S. Consider the system H 3 of all simple hypotheses and 
the following weight function 


(4) W(6, 0) = W(8, ws) 


where 6 denotes the true parameter point and @ denotes the estimated point. 
A system M; of regions of acceptance for Hg is given by a vector function 6(E) 
of the observations such that to each point HE = (2, ---,2,) of the sample 
space M corresponds a certain point 6(£) of the parameter space. For each 
point 4 the region Ms, of the acceptance of the hypothesis @ = 4 is given by 
the equation 6(F) = 6. We shall call the function 6(Z) an estimate of 8, 
the system of regions Ms is uniquely determined by the estimate. We shall 
call 6(E) a best estimate relative to a given weight function if the system of 
regions determined by 6(E£) is a best system of regions relative to the weight 
function considered. 

Let us denote by 6(E) a best estimate of 6 relative to the weight function 
W (0, 6) defined in (4). A best system Mg of regions of acceptance in the original 
problem can obviously be obtained in the following way: Denote by w an element 
of S. The region M, of acceptance of the hypothesis H, consists of the points E 
for which 


Wé(z) = . 


Hence we can restrict our considerations to the case when the system of hypoth- 
eses is the system of all simple hypotheses. We shall deal with the problem of 
how a best estimate of @ can be found and what properties this estimate has. 


8. Some theorems concerning the best estimate. In order to study the 
properties of a best estimate 6(£) it is useful to consider hypothetical a priori 
distributions of @. We shall especially consider point distributions of 6, that 
is to say, distributions where a finite number of points 6, ---, 4; of the param- 
eter space Q exist such that the probability of any subset of Q not containing 
any of the points 6, --- ,6,is zero. If 6,,--- , 6, are given, a point distribu- 
tion is characterized by a vector p = (p1, --- , ps) Where p; denotes the proba- 
bility of 6; and 2p; = 1. 

If 6(E) denotes an estimate of @ and if f(@) denotes a distribution function of 6 
then the expected value of the loss, that is to say the expected value of the weight 
function W[6, 0(E)] is obviously given by 


(5) [ [ Wee, oe) |) aso) ae 


where p(E | 6) denotes the probability density in EF if 6 is the true parameter 
point and the integration is to be taken over the product of the sample space M 
and parameter space ©. 
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Let us assume that for every sample point FE there exists a parameter point 
§;(EZ) such that the expression 


(6) [ We, Drea ago 

becomes a minimum with respect to 6 for 6 = 6;(Z). We shall call the estimate 
6,(Z) a minimum risk estimate with respect to the distribution f(@), since also 
the expression (5) becomes a minimum for the estimate 6;(£). 

We shall make the following assumptions: 

Assumption 1. The parameter space is a bounded and closed subset of the 
k-dimensional Euclidean space. 

Assumption 2. The weight function W(@, @) is continuous in 6 and @ jointly. 
Assumption 3. The probability density p(# | @) is continuous in EF and @ 
jointly. That is to say if lim EH; = E and lim 6; = @ then lim p(E; | 6;) = 
y(E | 8). 

Assumption 4. For any distribution f(@) of @ there exists at most one minimum 
risk estimate 6;(E).° 

Assumplion 5. If f(@) and f’(@) denote two different point distributions of @ 
and if @,(#) and 6,;(#) are minimum risk estimates corresponding to f(@) and 
f'(@) respectively, then 6;(E) is not identically equal to 6;-(E). 

The assumptions 1—5, with addition of an assumption 6 which we shall formu- 
late later, enables us to deduce important properties of the best estimate 6(E). 
First we shall prove some Lemmas by means of the assumptions 1-5. 

Lemma 1. For any a priori distribution f(0) of 6 there exists exactly one mini- 
mum risk estimate 6;(E). 

According to Assumption 2 W(@, 6) is continuous. Since the parameter 
space 2 is compact on account of Assumption 1, W(6, 6) is uniformly continuous. 
According to Assumption 3 p(E | @) is continuous; hence for any fixed sample 
point E, p(E | 6) is bounded. From these facts it follows easily that the expres- 
sion (6) is a continuous function of 6 for any fixed sample point E. Hence there 
exists at least one parameter point 6;(£) such that (6) becomes a minimum for 
6 = 6,(£). Since, according to Assumption 4, at most one parameter point 
exists for which (6) becomes a minimum, Lemma 1 is proved. 

If a distribution f(@) of @ is given then the distribution of each of the com- 
ponents 6°”, .-. , 0° of @can be found. Denote by Q; the set of real numbers 
which are discontinuities of the distribution of the component 6°’ (j = 1, --- , k) 


and form the set Q@ = Q: + --- +Q:. 


able. 
























As is well known, Q is at most denumer- 
A k-dimensional interval J of the parameter space given by 


a; < 0 <b; 


j= 1,---,k) 


is called a continuity interval of the distribution (6) if no a; and no b; belongs to 
Q. A sequence {f,(6)} of distributions is said to be convergent towards the 





5 As will be shown in Section 10, Assumption 4 is not as restrictive as it would appear. 
It will be satisfied in the great majority of practical cases. 
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distribution f(@), ie. in symbols lim f,,(@) = f(@), if for any continuity interval 
J of {(@) the probability of J corresponding to the distribution f,(6) converges 
with increasing » towards the probability of J corresponding to the distri- 
bution f(@). 

Lemma 2. If {f,(6)} (n = 1, --- , ad inf.) denotes a sequence of distributions, 
then there exists a subsequence {f,,,(0)} (m = 1, ---,ad inf.) which converges 
towards a distribution. 

As is well known, there exists a completely additive set function P(w) defined 
for all Borel measurable subsets w of 2 and a subsequence {n,} of {n}, such 
that for any continuity interval J of P@) the probability of J corresponding 
to the distribution f,,,(@) converges with increasing m towards P(J).? Since Q is 
bounded, there exists a continuity interval J such that for all n the probability 
of J according to f,(6) is equal to 1. Hence P(Q) = 1, that is to say, Pw) isa 
probability set function which proves Lemma 2. 

Lemma 3. If {f,(6)} (n = 1, --- , ad inf.) denotes a sequence of distributions 
which converges towards the distribution f(6) and if lim E,, = E then 


lim 6;,(Z,) = 6;(£), 
where 6;,(E) denotes the minimum risk estimate corresponding to f,(8) and 6;(E) 
denotes the minimum risk estimate corresponding to f(@). 


If {¢n(@)} denotes a sequence of real valued functions which converges uni- 
formly towards a continuous function 9(@) then 


(7) lim [ gn(6) df,(6) = [ (0) df(6). 


Since {¢,(@)} converges uniformly towards ¢(6), (7) is obviously true if 


iim [ 0) a0) = [ oO ao) 
holds. The latter equality follows easily from the fact that 2 is compact. 
Consider a subsequence {n»} of {mn} such that lim @,, .,,)(Zn,,) exists. Denote 


this limit by 6*. In order to prove Lemma 3, we have only to show that 6* = 
6;,(E). If 0;(#) ¥ 6 then on account of Assumption 4 


(8) [ W10, 0(E)\p(E | 8) df(0) < I WO, op | 0) df 


W (6, 6) is uniformly continuous since 2 is compact. On account of Assumption 
3 also p(E | 6) is uniformly continuous in the product of 2 with a bounded subset 
of the sample space. Hence 


Wi8, Oyn¢my (En \\p( En, | 6) 


m 


converges uniformly in @ towards 


W (8, 0*)p(E | 8) 









| 
; 
y 
4 


ion 
set 
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and we have on account of (7) and (8) 


lim [ wie, 94m) (En) CE ny, | 6) dfn, _ [ W (0, 6*) p(E | )) df 


> | Wie, oz) ip |6) df, 
2 
and 
(10) lim I . W106, 0;(E)|p(E | 6) df,.,, = [ W(0, 0,(E)|p(E | 6) df. 


From (9) and (10) it follows that there exists a positive 6 such that for sufficiently 
large m 


I WO, Oyu cm) (Enm) P(Enm |9) dfny, > [ W(0, 0,(E)|p(E | 6) dfn,, + 6. 


Since the sequence of functions {p(E, | 6)} converges uniformly in @ towards 
p(E | 6), we have for sufficiently large m 


[ W10, 65.4) Bon )1PE ng |) Mam > J WO, O/(E)IP Bam |8) dog 


But this is a contradiction, since 6;,(#) is a minimum risk estimate. Hence the 
assumption 6* ~ 6;(E) is proved to be an absurdity. This proves Lemma 3. 

Lemma 4. 1'0 each positive e a bounded and closed subset M, of the sample space 
M can be given such that 


/ p(E|0)dE >1—« 
M, 


for every point 6 of the parameter space Q. 

Let us assume that Lemma 4 is not true and we shall deduce a contradiction. 
Denote by M,(v = 1, 2, --- , ad inf.) the sphere in the sample space M whose 
center is the origin and whose radius is equal to v. Sinee Lemma 4 is supposed 
to be not true, to each v there exists a parameter point 6, such that 


(11) / P(E \0,)dE <1—e (y = 1,---, ad inf.). 
M, 


Since 2 is compact, there exists a subsequence {6,,} of the sequence {6,} such that 
lim @,, exists. Denote lim 6,, by 6. Since 


p00 


| p@\ ax =] 


there exists a positive integer v’ such that 


/ p(E|@)dE >1—*. 
M,. 2 
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On account of Assumption 3 we get easily 
lim [ pBla,ae = [ piu loan. 
p= Y My My! 


Hence for sufficiently large » we get 


/ p(E |6,,)dE > i P(E |0,,)dE > 1—«, 
Mvy M,’ 


in contradiction to (11). This proves Lemma 4. 
For any estimate @(£) we shall call the integral 


Mp = [ WI6, 6(E)|p(E | 6) dE 


the risk function of the estimate 0(#). The value of the risk function (6) is 
for any 6 equal to the expected value of the loss (of the weight function) if @ is 
the true parameter point. 

Lemma 5. T'o any positive n a positive 6 can be given such that for any estimate 
6(E) and for any pair 0, 6 of parameter points whose Euclidean distance is less 
than 6 the inequality 


|r) — r@’) | = [ W(0, 0(E)|p(E | 6) dE — [ W(0’, 0(E)|p(E | 6") dE | <1 


holds. 

Since W(6, 6) is uniformly continuous, to any e > 0 a positive 6 can be given 
such that for any pair of points 6, 0’ whose Euclidean distance is less than 6 
the relation 


(12) | W(0, 6) — W(0’, @)\<« 


holds for every 6. On account of Assumption 3 6 can be chosen in such a way 
that also the inequality 


(13) | p(E | 0) — p(E | 6’)| <« 


is satisfied for any sample point FE of a bounded subset M’ of M and for any 
pair 6, 0’ whose Euclidean distance is less than 6. 

Since W(6, 6) is continuous and @ is compact, W(6, 6) must be bounded. 
Denote by A an upper bound of. W(8, 6). According to Lemma 4 there exists a 
bounded and closed subset M’ of the sample space M such that 


n 
i > ~~ aa * € ° 
[ r@loan >1 5a fr any 6 
It is obvious that 


[ Whe, oe)Ipe |) ae [ We, peo) az <?. 
M—M’ M—M’ | 
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In order to prove Lemma 5 we have only to show that 


as) Wie, o)IpE |e) az — [ Wie", 0B) 6) az! < 


n 
9° 


On account of (12) and (13), (14) is certainly true for sufficiently small e. 
Hence Lemma 5 is proved. 

Lema 6. If the sequence {f,(6)} of distributions converges towards the distri- 
bution f(@) and af r;,(@) denotes the risk function of the minimum risk estimate 
0;,(Z) then {r;,(0)} converges uniformly towards the risk function r;(0) of the 
minimum risk estimate 0;(E). 


According to Lemma 4 to any positive «€ a bounded and closed subset M, 
of M can be given such that 


(15) [ pw\oan >1-< 
Me 


for every 9. From Lemma 3 it follows easily that {6;,(#)} converges uni- 
formly towards @;(Z) in M,. Hence 


lim | Wl, 6;,(E)Ip(E | 6) dE = | WI6, 6/(E)Ip(E | 0) a 


holds for every @ and for every positive «. Since W(@, 6) is bounded and e can 
be chosen arbitrarily small, we get on account of (15) that 


lim | Wie, 6;,(E)Ip(E | 6) dE = | Wie, 6(#)Ip(e | 6) ae, 


that is to say 
lim r;,(0) = 7r;(@). 


The uniformity of the convergence follows easily from Lemma 5. 
In the following argument we shall consider an arbitrary but fixed system of s 


parameter points #,---,6,, and point distributions such that no point 
6 ~ 0,,---, 6, has positive probability. Such a point distribution is charac- 
terized by a vector p = (p1,---, ps) Where p; denotes the probability of 6; 
(@ = 1,---,s) and 2p; = 1. The points 6, --- , 6, are kept constant and 
only p will vary. Hence if we speak about different distributions p = 
(1,--+, ps), p’ = (p1,---, .) they are always related to the same points 
#,---, 6, unless we state explicitly the contrary. 


Lemma 7. If p = (p1,---, ps) and p’ = (p, + Api, --- , ps + Ap) denote 
two different distributions then 


: 1 — Dar + ddedlede) ~ eda <0 
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holds for any positive , where 
ro) =f Wles, 6,(E)Ip CE | 0) a (i =1,-+-,8), 


ro!) = | WO, 6B) Ip | a) aE, 
and 0,(E) and 6,:(E) denote the minimum risk estimates corresponding to p and p’ 
respectively. 


We have 


zs (po: + Ap:)ri(p) = [ a WI6;, 6,(E) |p; p(E |0;) dE 


EZ & + adele? = [ Y Wi6:, 0,-(E) |; p(E | 6) dE = Io. 


Since 6,-(£) is the minimum risk estimate corresponding to p’, we have J; > I. 
We shall show that J; > Jz. According to Assumption 5 @,(£) is not identically 
equal to 6,,(Z). Hence there exists a point E’ such that 0,(2’) ¥ 6,-(E’). On 
account of Assumption 4 


WO; , 0,(E’)|p.p(E’ | 0:) > SWI8;, 0-(E’)lo:p(L’ | 0). 


From Lemma 3 it follows that 6,(#) and 6,-(£) are continuous functions of E. 
Hence there exists a positive 6 and a sphere s with center in EZ’ such that 


=W(8; , 0,(E)leip(E | 0:) > WO: , 8(E)loip(E | 0:) + 6 


for every point E of S. Since 6,(£) is the minimum risk estimate corre- 
sponding to p’ we have 


=WI4:, Oo(E)leip(E | 8:) = SW[8:, O(E)|eip(E | 8) 
for every point E outside S. Hence J; > J: that is to say 
(16) X(pi + Api)ri(e) > =Z(pi + Api)ri(o’). 
Analogously we get 
(17) Lpiri(e) < Xpiri(p’). 
Multiplying (16) by an ahi, positive value \ and subtracting (17) we get 
Z[A(pi + Api) — pilri(e) > Z[A(oi + Api) — pilri(e’). 
Hence 
Z[(A — Ips + AApil[ri(o’) — ri(o)] < 0. 
Let us denote for any p the maximum of the numbers 


ry(p), a r,(p) 
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by r(p). We shall call a distribution p for which r(p) becomes a minimum, a 
risk-minimizing distribution. We shall say that the risk-minimizing distribu- 
tion p = (p1, --- , ps) is not degenerate if p, > 0,---,p., > 0. Otherwise we 
shall say that p is degenerate. 

Lemma 8. There exists at least one risk-minimizing distribution p. 

From Lemma 6 it follows that 7:(p), --- , 7.(o) are continuous functions of p. 
Hence also r(p) is continuous. Since the set of all possible distributions p is 
bounded and closed, there must be at least one distribution p for which r(p) 
becomes a minimum. 


Lemma 9. If p = (po, --- , ps) denotes a risk-minimizing distribution which is 
not degenerate then 


ri(p) = T2(p) = --- = re(p). 


Let us assume that there are two integers 7 and j, for instance 1 and 2, such 
that 71(e) < 72(p). We shall deduce a contradiction from this assumption. Let 
us consider two different distributions p’ = (p:, --- , ps) and p” = (p;, --- , p.) 
where p; > 0. Hence at least one of the quantities 


/ Mt / “7 
(p:1 — pi), +++ 5 (Pe — ps) 
is unequal to zero. Since Sp; = Zp; = 1, also at least one of the quantities 
/ a” / ut 
(p2 — po), --- , (Pe — pe) 


must be unequal to zero. On account of Lemma 7 we have 


8 


D (A — pi + Moy — psd] [ri(o’”) — rip’)] < 0. 


i=1 

/ 

If we put A = 2 we get 
Pl 


2 | (% - 1) + (pi — ot) |lrte”) — rile’)] < 0. 
i=2 Pi Pi 


Hence at least one of the quantities 


ra(p’") — re(p’), --- , re(0’’) — ra(p’) 
must be unequal to zero. 


Since p; > 0, there exists a closed sphere S, with center at p such that for any 
point p’ of S, p; > 0. Hence for any two different points p’ and p” of S, at 
least one of the quantities 


ralp’’) — ra(p"), «++  re(0") — re(0') 


is unequal to zero. Denote by S, the projection of S, on the s — 1 dimensional 
space given by p, = 0. Consider the transformation according to which the 
image of the point p’ = (p2,---, .) of S, is the point 9(p’) = [ra(p’), --- , 
r(p’)]. It is obvious that the images of two different points of S, are different. 
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Since r;(p) (¢ = 1, --- , s) is continuous, the transformation is continuous and 
therefore topological. Denote the image of 8S, by R,. Since p = (p2, --- , ps) 
is an interior point of S, , according to the Brouwer-Jordan theorem’ on domain 
invariance the image 9(p) = [re(p), --- , 7s(p)| of p must also be an interior 
point of k,. Hence for sufficiently small « > 0 the point 


t(e) = [re(o) — €,--- , re(e) — €] 
is contained in R,. Denote by p(e) = [pe(€), --- , ps(€)] the point of S, whose 
image is t(e). It is obvious that 
(18) lim ple) “5 = (p2, rte ae ps). 
Consider the point p(e) of S, whose projection is f(e) that is to say p(e) has the 
co-ordinates 1 — Lfpi(e), pele), --- , ps(e). From (18) it follows that also 
(19) lim ple) = p = (pi, pa, +++, ps). 


Since r,[p(e)], --- , re[e(e)] are continuous functions of € and since r(p) < re(p), 
for sufficiently small e the maximum of the numbers 


rilp(e)], rele(e)] = re(o) — «¢, --- , relo(e)] = rs(0) — «€ 
is certainly smaller than the maximum r(p) of the numbers 


ri(p), a rs(p), 


in contradiction to our assumption that p is a risk minimizing distribution. 
Hence the assumption 7;(p) < 72(p) is proved to be an absurdity and Lemma 9 
is proved. 

In the previous arguments we have considered an arbitrary but fixed system 
of s parameter points 6, --- , 0. and all distributions p were related to these 
points. In the following arguments we shall vary the points 6, --- , 6; and 
therefore we shall have to state the parameter points to which the distribution p 
is related. 

Let us consider a sequence {6,} (v = 1, ---,ad inf.) of parameter points 
which is dense inQ. We say that a subset w of Q is dense in Q if for each point 6 
of Q any arbitrarily small open neighborhood of 6 contains at least one point of . 
Since 2 is compact, a sequence {6,} which is dense in Q certainly exists. Let us 
consider the first s points @,---,6, of the sequence {6,}. According to 
Lemma 8 there exists for any s a risk-minimizing distribution p(s) = [p:(s), 

- , ps(s)| related to (,--- , 0. 

Assumption 6. There exists a sequence {@,} (s = 1, --- , ad inf.) of parame- 
ter points which is dense in @ and such that for almost any s’ the risk-minimizing 


6 See for instance Alexandroff and Hopf, Topologie, Berlin 1935, p. 396. 
7 By ‘‘almost any s’’ we understand “‘for all s greater than a sufficiently large integer.” 
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distribution p(s) = [p:(s), --- 
is not degenerate. 

Lema 10. Denote by {@,} (s = 1, 2, ---,ad inf.) a sequence of parameter 
points for which the conditions of Assumption 6 are fulfilled. Denote by p(s) = 
[pi(s), --- , ps(S)] the risk-minimizing distribution related to the first s points 
6,,---,0.. Then there exists a non-negative constant c such that for any arbi- 
trarily small positive ¢ the inequality 


, p(8)] related to the first s points 0,,--- , 4, 


c—e< | We, 6,0) aE <c +. 
M 


holds identically in @ for almost every s. That is to say the risk function of the 
minimum risk estimate 0,:s)(H) lies entirely between c — € and c + e for almost 
every 8. 

Denote the risk function 


/ A WI8, 8,.)(E) |p (E | 0) dE 


of the estimate @,:.)(Z) by r(@, s). First we shall prove that there exists a 
sequence {c,} (s = 1, --- , ad inf.) of non-negative numbers such that for every 
e > 0 the inequality 


(20) Ce — e<7r(0,s) Sl ete 


holds for almost every s. In fact to any positive 7 a positive integer s, can be 
given such that for any s > s, the points @,, --- , 0, are y-dense inQ. That is 
to say every point @ of © lies in a sphere with radius 7 and center in one of the 
points 6, --- ,6;,. Since for sufficiently large s p(s) is not degenerate, we have 
on account of Lemma 9 for sufficiently large s 


(21) r(,,s) = --. = r(0,,8) = G. 


Since for sufficiently large s 6,,--- , 6, is n-dense in Q, we get easily from 
Lemma 5 that (20) holds for any positive ¢ for almost every s. 
In order to prove Lemma 10 we have only to show that lim c¢, exists and is 


s=0 


finite. First we see that for no estimate 6(£) can the corresponding risk function 
r@) = | Wie, oI (E 6) az 

lie entirely below r(@, s) that is to say 

(22) r(@) < r(@, s) 


cannot hold for any @. In fact if (22) were true for a certain estimate 6(#) then 


Zpi(s)r(8;) = [ TWO; , 0(E)lei(s)p(E | 8) dE < Xpi(s)r(0;, 8) 


a [ SW14;, Cw (E)lo.(s)p(E | 0) dE, 
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which is not possible since 6,,)(Z) is a minimum risk estimate. Hence (22) 
cannot hold for any @. From this fact follows easily that lim c, exists and is 
finite. This proves Lemma 10. 

Lemma 11. Denote f(@) a distribution of 6 and let 6;(E) be the corresponding 
minimum risk estimate. If 0(E) denotes an arbitrary estimate then 


r(0) = r;(6) 
uf 0;(E) ¥ 0(E) only in a set of measure 0, and 
[ rap > [ raf 
if 0;(E) ~ O(E) in a set of positive measure. r1(@) denotes the risk function of 
6(E) and r;(0@) denotes the risk function of 0;(E). 
If 6;(E) ¥ 6(E£) only in a set of measure zero, then we have obviously r(6) = 


r;(@). Consider the case that 6;(E) ~ 6(E£) in a set M’ of positive measure. 
According to Assumption 4 we have 


[ W (0, 0(E)] p(E | 6) af(@) > [ W (0, 6(E)| p(E | 8) af(e) 
for any point E of M’. Since 
[ W (6, 0(E)] p(E | 6) df() = [ W (0, 0,(E)| p(E | 8) af(e) 


for any other point E of the sample space M, we get 


[ (6) af = [ [ W (6, 0(E)] p(E | 6) df aE 


> [ [ W (6, 6(E)] p(E | 6) df dE = [ r (8) af. 


Hence Lemma 11 is proved. 

We are now able to prove some theorems about the best estimate 6 (£) relative 
to a given weight function. An estimate 6(EZ) is a best estimate according to 
our definition 7, if the maximum of the risk function of 6(Z) is less than or equal 
to the maximum of the risk function of any other estimate @(£) and if 6(£) is an 
admissible estimate (that is to say there exists no estimate 6(#) such that the 
risk function r(@) of 6(E) is not identical to the risk function 7(6) of 6(Z) and 
in every point 6 7(6) > r(@). 

THeoreM 1. If 6(E) is a best estimate and if the Assumptions 1-6 are fulfilled 
then the risk function 7(@) of 6(E) is constant, that is to say 


7(0) =. 


According to Assumption 6 there exists a sequence {@,} (s = 1, --- , ad inf.) 
of parameter points such that {6,} is dense in Q and for almost every s the risk- 
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minimizing distribution p(s) related to 6,,---,6, is not degenerate. On 
account of Lemma 10 there exists a non-negative constant ¢ such that for any 
e > 0 the inequality 


(23) c—e<r(6,s) << ct+e 


holds for almost every s. r(@, s) denotes the risk function of the estimate 
6,..)(#). According to Lemma 2 there exists a subsequence {s,} (n = 1, ---, 
ad inf.) of integers such that the sequence {p(s,)} of distributions converges 
towards a distribution f(@). From Lemma 6 it follows that 


lim r(6, 8.) = r;(6) 


n=o 


where 7;(@) denotes the risk function of the minimum risk estimate 6;(#). On 
account of (23) we have 


r;(0) = ¢. 
From Lemma 11 it follows that for any other estimate 6(#) either 


r(@) = 7;(0) =c 





[roa> | nod, 


where r(@) denotes the risk function of @(Z). In the latter case there exists at 
least one point @ for which r(@) > 7;(@). Hence 6;(E) is a best estimate. If 
6(Z) is also a best estimate, we get on account of Lemma 11 that 6(E£) can 
differ from 6;(£) only in a set of measure 0 and the risk function of 6(£) is 
identically equal toc. Hence we have proved Theorem 1 and also the following 
Theorems 2-3: 

THEOREM 2. If the Assumptions 1-6 are fulfilled there exists a distribution 
f(0) of @ such that the corresponding minimum risk estimate 6;(E) is a best estimate. 

THEOREM 3. If Assumptions 1-6 are fulfilled and 6(E), 6*(E) are best es- 
timates, then 6(E) = 6*(E) almost everywhere and the corresponding risk functions 
are identically equal. 

Now we shall prove (without making the Assumptions 1-6) 

TuroreM 4. If W(6, 6) and p(E | @) are continuous and Q is compact, and if 
f(0) denotes a distribution of 6 such that any open set has a positive probability, 
then the minimum risk estimate 0;(E) is a best estimate if its risk function r;(@) 
is identically equal to a constant. 

Let r;(@) be identically equal to c and consider an arbitrary estimate @(£). 
Since W(6, 6) and p(E | 6) are continuous and Q is compact, the risk function 
r(0) of 6(£) is a continuous function of 6. Since 6;(Z) is a minimum risk 
estimate we have 


(24) [ (6) af > [ r(0) df =c. 
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In order to prove Theorem 4, we have to show that either 
(25) r(@) =c 

or there exists a point 6’ such that 

(26) r(@’) > ¢. 


If (25) does not hold there exists a point 6* such that r(6*) # c. If r(@*) >c¢ 
our statement is proved. Consider the case r(6*) < c. On account of the 
continuity of r(@) there exists a positive 6 and an open neighborhood U of 6* 
such that 


r(@) <e—6 


for every 6 in U. Since [ df is assumed to be positive, the inequality (24) 
. 


can hold only if there exists at least a point 6’ for which r(@’) > c. This proves 
Theorem 4. 


9. Determination of the best estimate 6(Z) for a certain class of distributions 
p(E | 6). In this paragraph we shall prove two theorems which enable us to 
calculate very easily the best estimate 6(Z) for a certain special but important 
class of distributions. 

The risk function of an estimate 6(EZ) is given by 


r@) = | Wo, 9) pH | 0) dE, 


where the integration is to be taken over the whole sample space M. We con- 
sider the integral equation 


(27) [ W(6, 0()] pe | #) dE = co, 


where c denotes an arbitrary constant. If we can find an estimate 6(Z) which 
satisfies (27) for a certain c and which is an admissible estimate relative to the 
weight function considered, then 6(£) is certainly a best estimate. If Assump- 
tions 1-6 are fulfilled, an admissible estimate satisfying (27) certainly exists. 
As we shall see, a best estimate can very easily be determined by the above 
procedure if the conditions in the following theorem 5 are fulfilled. 

THEOREM 5. Let us assume that the following conditions are fulfilled: 

I. The parameter space Q is one dimensional and @ can take any real value from 
—~ lo +. 

II. The probability density p(E | 0) depends only on the differences x, — 8, 

- , Xn — 9, that is to say p(E | 0) = p(m — 0,--- ,%n — 8), where xm, --+ , In 
denote the co-ordinates of E. 

III. The value of the weight function depends only on the difference u = 6 — 8 
and is uniformly continuous in u. 
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IV. For any value 6 and for any sample point E the integral 


+20 
(28) ¥(6, EZ) = W(6 — 6)p(E | 6) dé 
has a finite value. 

V. For every E there exists a finite value 6’(E) such that ¥(6, E) becomes a mini- 
mum for 6 = 6'(E). 

Then there exists an estimate 6(E) such that for any E, ¥(6, E) becomes a 
minimum for 6 = 6(E) and 6(E"’) — 6(E’) = d for any E’ = (2;, --- ,x,) and 
E” = (aj, --- , tn) for which x{ — 4 = --- = 2, —a, =X. Anestimate with 
these properties is a best estimate. 

Let us consider two sample points E’ = (2; , --- , 2,) and E” = (2, --- , 2, 
such that aj — 2; = --- = 2, — x, =. From the conditions II and III 
follows that if ¥(@, Z’) becomes a minimum for 6 = 6, then (6, E’’) becomes 
a minimum for 6 = 6 = 6, + . Hence there exists an estimate 6(E) = 


6(1,, ---, 2%) such that for any E, (6, Z) becomes a minimum for 6 = 6(E) 
and 6(E’”’) — 6(E’) = A if a — a2=--- = 2, — 2, =. We shall show 
that such an estimate 6(Z£) is a best estimate. First we shall show that the 
risk function 


+00 +00 
r(0) =| .. [ W lo — KB) v(x, — 0, ---, 2, — 0) dz, --- de, 


is constant. Let us consider two arbitrary parameter values 6’ and 6’. Then 
we have 


+20 « +00 a 
r(@’) = | or | W [0’ — O(E)] p(a — 0’, «++, an — 0’) dry --+ dan, 


+00 +00 
r(6") = a | Wie’ — UB) ple, — 0", «+», ee — 1) des +++ da. 


Making in the second integral the transformation 
n= M4 — (0”’ - 0’), +s? a” (0” - 0’), 


we get 
+00 +00 - 
ro”) = | eee [ WwW {0”” — d[y: + (0” — 6’), = » Yn 
+ (6” — 6)])} py — 8, «++ 5 yn — 8’) dys +--+ dyn 


+00 +00 
| eed W [0’ — O(y1, «++, yn] p(y — 0, ++, yn — 8) dy +++ dyn. 


2 — GO 


Hence r(6’) = r(6’’) and our statement that r(6) is constant is proved. In 


order to prove Theorem 5, we have only to show that 6(£) is an admissible 
estimate. For this purpose let us consider an arbitrary estimate 6*(E) and 
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denote the corresponding risk function by r*(@). Since 6(£) minimizes the 
integral (28), we have 


(29) v[6*(E), E] > ¥[6(B), E)] 
for all sample points E. Let us consider the integral 
+00 +20 
go) r=[ ...[ {Wie 0)] — Wie — o*(B)] p(B) dodzs--- des. 


Integrating (30) with respect to 6 we get 


31) I= [ ae [ ” ty la(E), E] — vle*(E) — El} dry --- deta. 


Integrating (30) with respect to E, we get 

+00 
(32) r= [| r@ —r*@lao. 
On account of (29) and (31) we have J < 0, hence 


(33) [- [r(@) — r*(@)] do < 0. 


oo 


From (33) it follows that if r*(@) < r(@) for every @ then r*(@) < r(@) can hold 
only for the points of a set of measure zero. In case r*(@) is continuous, this 
means that r*(@) = r(@). Hence if r*(@) is continuous, then either r*(6) = r(6) 
or there exists at least one point @’ such that r*(@’) > r(@). The risk function 
r*(@) is continuous if the estimate @*(#) is uniformly continuous in the whole 
sample space. In fact, we have 


+00 +00 
*0+)=| oof Wle+t-—e(H) le —¢—t, ---,2.-0-—Odn--- de. 


Making the transformation 
Y¥=au-t 
we get 


r*(6 +t) = 
+00 +00 
[of Wetton te se $0 POL =8, syn — 8) dyn =» dye 


Since W(u) and 6*(#) are uniformly continuous, from the latter equation we 
get easily 

lim r*(@ + t) = r*(6) 

t=0 


that is to say r*(@) is continuous. Considering only continuous estimates the 
admissibility of 6(#), and therefore also Theorem 5, is proved. If 6*(£) is not 
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uniformly continuous we have only proved that if r*(@) < r(0) for every 8, 
then r*(@) < r(@) can hold only in a set of measure zero. I should like to 
mention without proof that even if @*(£) is not continuous, r*(@) < r(@) implies 
r*(0) = r(6). 

An estimate 6(E) is called a maximum likelihood estimate if for any fixed E 
p(E | 6) becomes a maximum with respect to @ for 6 = 6(E). 

THEOREM 6. Consider the following conditions: 

VI. There exists exactly one maximum likelihood estimate 6(E) with the fol- 
lowing properties: 

a) For any E p(E | 0) is non-decreasing with increasing 0 for 6 < 6(E) and 
non-increasing with increasing 6 for 06 > 6(E). 

b) For any E p(E | @) is a symmetric function of 6 about 6(E) that is to say, for 
for any real value \ p[E | 6(E) — | = piE | 6(E) + XJ. 

VII. The value of the weight function depends only on the absolute value of the 


difference u = @ — 6 and — 
du 













exists, is uniformly continuous and > 0 for 





u> 0. 

If the conditions I-V of Theorem 5 and the above condition VII are fulfilled, 
and if 6(E) is a maximum likelihood estimate satisfying VI, then 6(E) is a best 
estimate. 

Assume that the conditions I-V and VII are satisfied and that 6(E) is a 
maximum likelihood estimate satisfying VI. It is obvious that 6(E’’) — 6(E’) = 
\ for HE’ = (a, ---,2,) and EB” = (a + A,---,2n. +A). In order to prove 
Theorem 6, we have, according to Theorem 5, only to show that the integral 
in (28) 
















+00 


¥@,B) = [ We - dp(e\o) a0 





becomes a minimum for 6 = 6(£). Denote @— 6 by u. Since is uni- 





dW (u) 
du 





formly continuous, we have 


ay(6, EZ) _ f*” [dW(u) 
396 COtt*”W [. | du | PCE (6) ae. 





dW (u) ” _dw(- u) 






Since _ a we have 
(34) 10) [A Vince |d — w — pO + wld 
a6 0 du 






From condition VI it follows easily that for any fixed E and 6 the function of u 
(O<u< o) 








p(E | 6 — u) — p(E| 6 + wu) 
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does not change its sign and if 6  6(E) there exists an interval J such that the 
above expression is unequal to zero for every point u of J. Hence on account 
dW (u . — . ee : 
of A ) > 0 for u > 0, the integral in (34) vanishes only for 6 = 6(£). Since 
du 
according to the condition V there exists a finite value 6’ such that ¥(6, EZ) 
becomes a minimum for @ = 6’, 6’ must be equal to 6(£). This proves 
Theorem 6. 
The condition VI is seldom exactly fulfilled. But for large n, in the great 
majority of practical cases, VI will be fulfilled with good approximation and the 
best estimate approaches the maximum likelihood estimate with increasing n. 


10. Two examples. As a first example we consider a normal distribution 
with the variance 1. The mean value @ is unknown and we have to estimate 
it by means of a sample E = (2,---,2n). In this case 


1 -43(2;-0)2 
p(E|é) = a¢ it 
(27)? 
It is obvious that for a very broad class of weight functions the conditions I-V 
of Theorem 5 are fulfilled. The maximum likelihood estimate 6(2, , --- , Zn) = 
m+ +++ + a 


—' satisfies the condition VI of Theorem 6. Hence if the weight 
n 


function satisfies also the condition VII, then the best estimate of @ is the maxi- 
Ly + an + Xn 

n , 
Let us now consider a weight function defined as follows: 


W(0,6) = 230-06) if 6>0 


mum likelihood estimate @(a,---,2%n) = 


and 


W(o,d)=0-6 if d<8. 


Since for this weight function, the conditions I-V satisfied, according to Theo- 
rem 5 the best estimate of 6 is the value 6 for which the integral 


+00 @ a 
[ W6, be 3?!" do = | 2(6 — oe ®t" do + [ (6 — de 22 -* gg 
| R ; 


2 


becomes a minimum. As an easy calculation shows, the estimate obtained in 
this way is not the arithmetic mean. 
As a second example we consider the family of variates X(@) with the proba- 
bility density f(a, @) defined as follows: 
f(z,0)=1 if 0-3<2<043 


and 


f(z, 6) = O for all other values of z. 
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If HE = (%, --+ ,2n) denotes a sample point where x, denotes the smallest and 
z, denotes the greatest value in the sample, then 


p(E\6) = [I f(ai,0) =1 if r, -4 SOS m+} 
i=l 


and 


p(E | 6) = 0 for all other values of @. 


The classical method of maximum likelihood cannot be applied here, since 
p(E | @) is maximum for every value 6 for which z, — } < @< a+ 3. Itis 
obvious that for a broad class of weight functions the conditions I-V are satis- 


fied. The estimate 6(£) = a > . where x; denotes the smallest and z, the 


greatest value in the sample, satisfies the condition VI. Hence if the weight 
function satisfies also the condition VII, the best estimate of @ is given by 
a a + Ln 
(E) = —~—. 

(H) = 35 

Let us now calculate the best estimate of 6 if the weight function is given as 
follows: 


W(6, 0) = 6-86 if 6<@ 


and 


W(0, 6) = 2(6- 06) if O>8. 





In this case the conditions I—-V are satisfied but not the condition VII. We 
have to calculate the integral ¥(6, EZ) given in (28), which reduces in this case to 


z+} 


in W (6, 6) d@ = f 2(6 — 0) da + I (9 — 6) do 


— 
n - n 


¥(6, E) 


= 1.56° — [(a1 + 4) + 2(@, — 3)]6 + (xi + 3)* + Ge, — 4”. 


This expression becomes a minimum for 


Hence the best estimate of @ is given by this expression. 


11. Miscellaneous remarks. Assumptions 1-6 of paragraph 8 are sufficient 
but not necessary for the proof of the Theorems 1-3 (Theorems 4-6 have been 
deduced without Assumptions 1-6). They can be weakened in many respects. 
The assumption that the parameter space is bounded can be dropped if we 
impose certain conditions on the weight function W(@, 6) and the probability 
density p(E | 6). It is certainly not necessary to assume that W(@, 6) and 
p(E | 6) are everywhere continuous. It is however doubtful whether Theorems 
1-3 remain valid in the form in which they are stated, if we admit discon- 
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tinuities in a set of measure zero without imposing any other restrictions. Also 
Assumptions 4-6 can in all probability be essentially weakened. 

I should like to mention that Assumption 4 is not as restrictive as it would 
appear. Let us make this clear in the case that the parameter space is a one- 
dimensional interval [a, b]. If we assume that W(@, 6) is a polynomial of the 
second degree in 6 and the coefficient of 6° is positive for every 0, and if 
p(E | 0) > O for every E and 6, the Assumption 4 can easily be proved. In fact, 


(6, Z) = i W (6, 6)p(E | 6) df(@) = A(E) + B(E)6 + C(E)®. 


Since the coefficient of 6° in W(@, 6) is positive and since p(E | @) > 0 for every 
£ and 6, C(E) > 0 for every E and for any arbitrary distribution f(@). From 
this fact follows easily that for every E there exists a value 6(£) in the interval 
[a, b] such that 


¥[0(E), E] < ¥(6, E) 


for every 6 contained in [a, b] and unequal to 6(E). Hence Assumption 4 is 
proved. 

Let us consider a system S of subsets of the parameter space 2 and the 
corresponding system Hs of hypotheses. The weight function W (8, w) is defined 
for all points @ of 2 and for all elements w of S and expresses the weight of the 
error committed by accepting H, when @ is true. If @ is an element of w then 
W(@, w) is of course equal to zero. Let us assume that W(@, w) has the special 
form: W(6, w) = 1 if @ is not contained in w, and W(6, w) = O if @ is an element 
of w. It is obvious that in this case for any @ the value of the risk function 
r(@) is equal to the probability of accepting a false hypothesis if @ is the true 
parameter point. Because of this fact the theory developed here has close rela- 
tion to the theory of confidence intervals. Let us first make this clear for the 
case when the parameter space is one dimensional, that is to say @ is a real 
number. 

In the theory of confidence intervals we estimate the unknown parameter @ 
by an interval J(£) extending from 6,(F£) to @.(£) where 6,(£) and 62(E£) are 
certain functions of the sample point #. The interval J(£) is defined in such a 
way that the following probability statement holds: If we perform an experi- 
ment, the probability that we shall obtain a sample point E such that J(£) will 
cover the true parameter point 6, is equal to a given constant @ (called confi- 
dence coefficient) and is independent of the value of 6. Let us consider a 
certain example of such an inference with the confidence coefficient a and 
denote by J(E) the interval corresponding to E. We define a system S of 
intervals as follows: An interval J is an element of S if and only if there exists 
a sample point EF for which J(#) = J. Consider the corresponding system Hs 
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of hypotheses and the weight function W(6, 7) defined for all values @ and all 
elements J of S as follows: 


W(0,f) =0 if @isa point of J 
W(@, I) 1 if @is not contained in J. 


Denote by Ms a best system of regions of acceptance relative to the weight 
function defined above. Denote by I’(E£) the element of S which we accept 
according to Ms; if E is the sample point. On account of the special form of the 
weight function, the risk is obviously equal to the probability of accepting a 
false interval. From the definition of the best system of regions it follows 
that for any 6 the probability that I’(E) will cover 6 is greater than or equal to a. 
If the risk function is constant, that is to say, if the probability that I’(£) will 
cover the true parameter point @ is independent of the value of 6, then the 
intervals I’(E) are confidence intervals corresponding to a confidence coeffi- 
cient a’ > a. 

Similar observations can be made if the parameter space is k-dimensional 
(k > 1) that is to say, @ is a system of k numbers 6, .-- , 0. An important 
case is that when we have to estimate only one of the components, say 6°’, by 
an interval. As the investigations of W. Feller® have shown, confidence inter- 
vals in such cases do not exist always. That is to say, it is not always possible 
to determine J(Z) such that the probability that I(Z) will cover 6 is equal 
to a given constant « independently of the values of 6°”, ---, 0. It is of 
great interest to know under what conditions confidence intervals exist. I 
should like to mention that a further development of the theory given in para- 
graph 8 may contribute much to the solution of this problem. In order to make 
this clear, let us consider a system S, of one-dimensional intervals. To each 
element J of S, let there correspond the subset w of the k-dimensional parameter 
space Q consisting of all points @ = (0, --- , 0) for which 6 lies in J. Con- 
sider the system S of subsets w of 2 corresponding to all elements of S, and the 
system Hs of hypotheses corresponding to S. The weight function is to be 
chosen as follows: W(@, w) = 1 if @ is not an element of w and W(6, w) = 0 if 6 
is an element of w. Consider a best system Ms of regions of acceptance and 
the corresponding risk function 7(@). On account of the special definition of 
W(@, w), 7(@) is equal to the probability of accepting a false hypothesis if @ is the 
true parameter point. If the risk function r(@) is identically equal to a con- 
stant a, we have confidence intervals corresponding to the confidence coefficient 
a. In order to see under what conditions the risk function is constant, we have 
to consider an equivalent problem (see paragraph 7) where the system of hy- 
potheses is the system of all simple hypotheses and the weight function W(@, 6) 


8 W. Feller, ‘Note on Regions Similar to the Sample Space,”’ Statistical Research Memoirs, 
Vol. IT, 1938. 
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is given according to formula (4). If W(@, 6) satisfies Assumptions 1-6, the 
risk function of the best estimate is constant. As we have mentioned, Assump- 
tions 1-6 can be weakened. In order to get valuable results concerning the 
problem of the existence of confidence intervals, we have to weaken especially 
Assumption 2. In fact W(@, w) takes only the values 1 and 0 and therefore 
W(@, 6) cannot be continuous. 

Finally I should like to mention that the most stringent test as defined by 
Robert W. B. Jackson’ is contained as special case in our general definition of 
the best system of regions of acceptance. Jackson considers a discontinuous 
parameter space 2. Consider the problem of testing the hypothesis 6 = 4 
where 4) denotes a point of 2. According to Jackson’s definition we have the 
most stringent test if the critical region wo satisfies the condition: the maximum 
of the numbers A and B 


A = P(E ew| 6), B = least upper bound of P(E ¢ @ | 6) formed for all 6 ¥ 6, 


becomes a minimum for w = wo-® denotes the region complementary to w. 
It is easy to see that Jackson’s definition of the most stringent test coincides with 
our definition of the best system of regions of acceptance in the following 
special case: 

1) Q is discontinuous 2) S consists only of two elements. 

3) The weight function W(6, w) is equal to 1 if @ is not contained in w. 


CoLuMBIA UNIVERSITY. 


® Robert W. Jackson, ‘““Testing Statistical Hypotheses,’’ Statistical Research Memoirs, 
Vol. I, 1936. 





THE DISTRIBUTION OF THE MULTIPLE CORRELATION 
COEFFICIENT IN PERIODOGRAM ANALYSIS 


By D. M. STARKEY 


1. Geometrical interpretation of the problem. We begin with a summary 
of some recent work by Hotelling, in a form relevant to this particular problem.’ 
He suggests that the general question of finding the distribution of the multiple 
correlation coefficient corresponding to a fitted regression of y upon x may be 
solved by evaluating definite integrals corresponding to invariants of certain 
curves, surfaces, etc. For the purposes of illustration we may consider the case 
of fitting the relation 


Y =a+ Of(a, k, €) 


where f is an arbitrary function, and a, b, k, « are constants, to the observations 
y, where we are given n values of y, y1, 2, --- , yn and the corresponding values of 
t,%,,-+:-,2n. We shall postulate that the y’s are independent and normally 
distributed about a certain mean and that the regression may be fitted by 
means of the principle of least squares. 

We must minimize the sum of squares 


dX (Ya — ry — dX [Ya a bf (ra, k, e)}’ 


and hence we differentiate with respect to a, obtaining the first condition for a 
minimum 


dX [ya — a — Of (xa, k, €-)] = 0. 
In the following, all summations take place over a range a = 1 ton. Then we 
have 
a=%4-Of 
where 


j= 2f(ta, k, €) 


n 


Thus we minimize the sum of squares 
~[(ya — 9) — b(f(xa, k, €) — fy 


1 Harold Hotelling, ‘“Tubes and spheres in n-spaces, and a class of statistical problem’’, 
American Journal of Mathematics, April, 1939. 
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or, putting ys = Ye — 9 
Ya = Ya— ¥ = Of(te,k,e) —f 


we see that the quantity 2(y{ — Y4)’ is to be minimized. 

Geometrically we may regard the set of values (yi, ---, yn) as defining a 
point in n-space, and (Y,, --- , Y,) will also represent a point in n-space on the 
4-dimensional surface which may be obtained by eliminating a, b, k, « from the 
relations Y = a + Of(a, k, «). The points (yi,-++,Yn) and (Yi, -++, ¥5) 
represent the orthogonal projections of (yi, --- , yn) and (Yi, --- , ¥,) 
on the plane > yz = 0. Hence we have to minimize the distance between these 
projections, noticing that (Y;, ---, Yx) now lies on the 3-dimensional projection 
of the surface on which (Y,, --- , Yn) lies. The multiple correlation between 
the observed and fitted values is defined as 

Z(ya-G(Ya-— YF) _ ByaYa 


Vive 02(Ve— PP Vay FEY? 
and this is equal to cos 6, where @ is the angle between the lines joining the origin 
to the points (yi, ---, yn) and (Y;,---, Y,). For the purpose of evaluating R 
we may thus consider the projections of these points on the unit sphere in 
> Ya = 0 with centre the origin, these being 








/ / / / 
tints ga pi ile in intl 
V sy? Voy?) ™° \Ysy?’ Vsy?)? 


As by hypothesis the distribution of y has spherical symmetry about some 
point on the line y; = yg = --- -- = yn, then the distribution of y’ has spherical 
symmetry about the origin, and the probability distribution of the projection 
of y’ on the unit sphere is uniform. The projection of Y’ lies on a 2-dimensional 
surface on the (n — 2)-dimensional sphere, and for a given Y’ the probability 
that R is as great or greater than cos @ is proportional to the volume of the 
sphere in the (n — 2)-dimensional spherical space with centre Y’ and geodesic 
radius 6, so that the total probability that R lies between cos @ and 1 is equal 
to the ratio of the “area” of the portion of the unit sphere included by the 
envelope of these geodesic spheres to the “area” of the unit sphere. This 
envelope is that part of the unit sphere in 2 yz = 0 which is at a geodesic distance 
6 from the 2-dimensional surface on which the projection of Y’ lies, termed a 
“tube” by Hotelling. 

For very small values of 6 it may be assumed that this ratio is equal to the 
area of the two-dimensional surface on which Y’ lies, multiplied by a fixed 
multiple of 6”*. This is fairly evident intuitively, but has recently been 
substantiated by some results of Weyl’ who shows that this is correct for small 
values of 6, and indicates a series from which could be derived a series of ascend- 








2H. Weyl, ‘“‘On the volume of tubes,’’ American Journal of Mathematics, April, 19389. 
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ing powers of 6° by which successive adjustments could be made for larger values 
of 6. The coefficients in this series are finite invariants of the surface in which 
we are working. If we accept the first approximation we must consider the 
question of the extent of the surface, which depends on the range of values of 
the parameters k, «. The range which is eventually chosen depends on the needs 
of the practical statistician, while keeping in view the mathematical possibilities 
of effecting a solution. In the following work we consider in particular the case 
of periodogram analysis by putting f(z, k, «) = cos (kx + e). 


2. The case of periodogram analysis. 
paragraph, we fit 





With the notation of the preceding 


Y., =a+bcos (kre + €) 


to data (ta, Ya) @ = 1,2,---,n. 

We shall assume that the variate x is a measurement of time or some other 
quantity for which the measurements are made at equal intervals, taken as 
unity for convenience, so that 


x, = 0, Yeo=1,---,m=n-— il. 





Now we shall see later that we are interested in values of k such that 0 < k < 2z. 
For this range 





j= = cos (kta + €) 
n 


_ sin ($nk) cos [e + $k(n — 1)] 
7 n sin (4k) ; 


Hence, if Y’’ represents the projection of Y’ on the unit sphere 


sin (4nk) cos [e + 3k(n — a 
msn (RY 


Y. = \| cos (kta t+) — 


where A is to be determined so that 
4 
DYv=1. 


Now 


52 £% 2 _ 
>y”” = » | cos” (kite + e) _ sin ($nk) cos le + tk(n a 


n sin? ($k) 
and 
1 


1 sin nk cos [2e + k(n — 1)] 
2 


. 1 
2 cos” (kate + €) = 9 —_— i k 


n+ 


and hence 













1 
Isin nk cos [2e + k(n — 1)] _ sin® (Snk) cos? [e + $h(m — 1)] 


. 
/1 
YW a"t9o dat n sin? Gk) 
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the expression being continous at k = z. 
Then Y. = (cos (kta + €) — f) 
= d cos (kta + €) + E say. 


Regarding k and ¢ as curvilinear coordinates of a point on the surface, we apply 
the formula 


VEG — F2dkde 


for the element of surface area, where 


E= 2(%E), F= 5 OV ay rs G= 2(°Z=) 
ak ak de de 


rll 


In evaluating these summations, we shall need the following results: YY, = 0, 
naplle - ‘ 
LY, = 1, from which we obtain 


(1) > cos (kta + e) —S 
(2) > cos’ (kag + €) = I —. 


Differentiating these relations, we have 


— oO (ng 
(3) Lrq sin (kr, + €) 5 (2) 
os néx _ nérx 

2 


ee 8 (ng 
> sin (krz + ©) > (**) 


_ Me _ nbd 


rn ? 


a is lie , 187 f/1 +n? 
Dax, sin’ (kre + ©) 9 2% a + iaa( c ) 
_ n(n—1)(2n-1) , 1 Qrix , GAz 
wikenon: mae —>r +H 
. 8Ax n&&x 2n 


r3 + r2 


i2 +3 ) (n° + 1) 


(fxn + | 


Lxra cos (kta + €) sin (krqg + €) = “ a(t ) 


2 ak 2 


me Au(1 + né’) _ net 
3 Xe 
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-32(? +) 
2de\ 


(1 + née) — bes 






I| 


(7) 2 cos (kxq + €) sin (kta + ©) 









ia pa 1 a’ 1+) 
(8) Yxq sin” (kta + €) ; Dte + ; (44, 
n(n — 1) 1 Xx 


“ 3 eAx 
= 4 “35 0+) +5 






(1 + ng’) — ™ oe 














n&EeNx 


NEE» née 
+ , k 4- k es - 


2? 2n? 









aY. 


ry = \; COS (k2te a €) — AV sin (kre + €) + fi. 















aVa 


— d. cos (krg + €) — Asin (kre + €-) + &, 
€ 


so that with the above definitions of EZ, F, G we obtain 











E = XD cos (kta + €) + ND sin*(kaa + €-) + nti — 2dAVra cos (kta + ©) 


€) 














-sin (kaa + €) — 2A&Dr_ sin (kata + €) + Wr V cos (kta + 





— Ni n(n — 1)(2n — 1), Ly 2 fF? 4 Ipf?y? 
=~ Tae ba nee See 









F = \AxD cos’ (krg + e) + NX’ Darq sin’ (kre + ©) + né&k 











—A,Z sin (kaa + €) cos (kaa + €) + EAE cos (kag + €) — AELVIasin(krg + €) 
+ r.E.E cos(kre + + e) 

















€) — AA. Lar, sin (kag + €) cos (kag + €) — AED sin (krq 


a 1 Nee , A n(n—1) | Nffee,2 — mV fifi 
7“ (a) at 4 ie in . 









G = XD cos” (kxq + €-) + NE sin’ (krag + €) + n&2 — QED sin (krq + ©) 


€) 


















— 2dd.2 cos (kta + €) sin (kxq + €) + 2d.&.2 cos (kta + 


3 
is -* +t ~ ols? +2) - ff, 





after using the relation £ = —f) to eliminate €. 
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These relations give 


_ pr — ( _Me i, n—1 — of , 1 ales 


-s 
—(s a na 4 rNn(n ~ 1) +: mf fier’ _ ae) 


(9) x ( Me 4 ay? — (nfr? +1) — mf 


20? 4 2 2 
The area of the surface on which Y” lies is 
WEG — F? dk de 
over an appropriate range of values of k and e, but it appears that this integral 
cannot be evaluated exactly. We shall obtain an approximation for large 


values of n, by obtaining approximations to , f, and their derivatives, when n 
is large. 


; 2r sw z _ 
The range of periods, e will be considered to vary from quantities greater 


than one up to half the range, that is $}(n — 1). This is chosen on the grounds 
that the intervals of time would be adjusted so that there would be no expecta- 
tion of periods less than the interval, and that enough observations would be 
chosen to include at least two periods in the range. Although this supposes 
some a priori knowledge of the possible periods, it seems reasonable to expect 
that the experimenter would have at least a rough idea of the range of periods 
which might fit his data before attempting to fit a harmonic curve. This range 
gives a range of values of k from 4r/(n — 1) to 2x(1 — v) where v is arbitrarily 
small, but fixed. In all cases the epoch, e, varies from 0 to 2z. 

It is readily seen that the surface is traced out only once for this range of 
values of k, e, so that the problem in its approximate form is reduced to that of 
the evaluation of the definite integral 


2¢ p2x(1—v) eT ee 
| [ VEG — F? dkde. 
0 4x 
n—1 
We shall obtain the approximations mentioned above, in the first place excluding 
from consideration values of k in the neighbourhood of 0, 7, 27, the integrals 
over small ranges including these values being obtained separately. 
If k is not in the neighbourhood of 0, z, 27, we note that 
sin (4nk) cos [e + 3k(n — 1)] 
sin (3k) 
is a bounded function of k, the upper bound being independent of k, and at most 


equal to | cosec 4 k, | , where k; is the angle in the range considered nearest to 0, 
mw, 2x. Similarly the upper bound of 


sin (nk) cos [Qe + k(n — 1)} 





sin k 
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is at most | cosec k, |. Hence as n is increased, we may expand \/n’ in ascend- 
ing powers of n'. For large n, therefore, \ = O(n™*), and is approximately 
(2/n)’. Since differentiation with respect to k introduces a multiplying factor n 
in some of the terms, it follows that this is compensated for by the factor \~ 
which occurs in the denominator of the derivative, and we may conclude that 

= O(n’). No such compensating factor n occurs in the numerator of d., 
and it is therefore of order (n’). It may readily be seen without actually 
evaluating the derivatives, which are very long and unwieldy expressions, that 


Nix = O(n’), Nee = O(n), fe = O(n), fr = O(1), 
Tx = O(n), fue = O(1), F = 
(n - — 1)n(2n — 1) 2 


12 













O(n"). 


We thus see that the term of highest order in E = 





The term of highest order in G = nd’ — 1. 
n(n — 1) .2 
4 


. : . 
These are approximately constant for large n, and are equal to n°/3, 1, n/2 toa 
first order of approximation. Hence 


The term of highest order in F = 


‘aan n 
_ Ren —. 
/ EG Vi2 
The range for k may be broken up as follows: 


(a) from ~ _. tor, 


i where a is a finite angle, independent of n. 












(b) inti _ = 
n n 


(c) from — tor + = 
n n 











(d) from + + to2r-<= 
n n 


(e) from 27 — = to 2r(1 — »). 





The method of procedure will be to show that in ranges (a), (c), (e) the integrand 
is of order n, and that since the ranges in all three cases are of order n’, the 
values of the integrals in these ranges are O(n’) which is negligible in comparison 
with the contributions from (b) and (d), which are O(n). 


In (a), <_< < =t we put k = = a = 4m, and let 6 range from p to 


n'- 
, Where pis a positive quantity defined by the relation (n — 1) = rs 
\ T are of orders n’ and n° respectively. 
orders of the derivatives are: 


NK Arce Oe 
nt yt 


Then 
For this range of aioe of 5, the 
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‘These being decreased, it es oe the order of ~/ EG — F? is not increased 


for any positive 6, and /EG — ; = O(n) as _—. 


In (c), 7 — = <k<r < * we putk =7+— aint? according as k 2 7, and 
consider0 <6< 4. The when of the derivatives are as stated in (a) above for 
° rn . a @. 
this range. The remainder of the range r — —- < k < wr + — is such that the 
n n 


values of the — atives are of orders as stated with 6 = 0, while \ = O(n °). 


Thus EG = O(n) throughout. 

In (e), 27 — a < k < 2r(1 — v), we put k = 2x — a , and consider 0 < 
5< 4. Inthis range the orders of the derivatives are asin (a). In the remainder 
of the range, 27 — = < k < 2x(1 — »v), the orders of the derivatives are as in 


(a) with 6 = 0, so that ~/EG — F? = O(n). 
As the ranges (b) and (d) are not independent of n, it remains to be shown 


that this fact does not affect the final result. We consider, therefore, k = a 


a ° ° ° 
and k = 7 — —~, where 3 < 6 < 1, and since, as in (QO) the second and third 
- 
: . . 5 ~25 1 : 
terms in the denominator of \ are O(n’ °’) and O(n’) or O(n") respectively, 
nN ‘ : ‘ ‘ - , 
\~ 1/ / 5) while the derivatives have values as in case (a). Thus, in these 
hwy To n ry © 
ranges, »/EG — F? ~ —= throughout. Thus we may conclude in all cases 
V 12 
that »/EG — F2 = O(n). 


7 =v "oe n 
The surface area = -[" +f + 


me 2x (1—v) 

+ / "op | EG — F* dk | de 
a+ ee 2x— a 
a/n Jn 


al | /n n 
fi? i F? dk | de 


Jn ar 


Vi 2x” (1—v) 
+ C\ ab fy - ef VEG — F? dk | de. 
=o _— 2 “aha 
In the first two ranges, ~/EG — F? ~ ; 
Ji 12 


In the last three ranges, ~/ EG — F? = O(n) and therefore the integral = O(n’). 
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Thus the area is equal to 


n + F . 
— 2r| 2x — ) + terms of lower order = a = ~ V3 n. 
V12 ni V12 3 
In the case of fitting a linear regression with 3 independent variates, the 
distribution of R is well known to be 


P[3(m — 1)] 

PG) [a(n — 4)] 
It may readily be seen by a repetition of the argument used in the first paragraph 
that this expression could be derived by considering the volume of a tube in 
spherical space of (n — 2) dimensions, in which the base surface is a 2-dimen- 
sional unit sphere of area 47. We are assuming that the first approximation 
to the volume of a tube is equal to the area of the surface multiplied by a fixed 
function of 6. If, therefore, we divide this expression by 47, and take R suffi- 
ciently close to 1, or 6 = cos ‘R sufficiently close to zero, we shall obtain the 
expression by which to multiply the surface area, (10), in order to obtain the 
first approximation to the frequency function of R. 
Using Stirling’s approximation, we have 

r[i(n — 1)] ~ Soe ER(n — Ihr? 


and rk(n — 4)] ~ V2e en ken “nn — 4) : 
3 


(10) 


(rR?) — R?)*°® d(R?). 


3(n—4) 
The ratio of these = ear(t + - :) (n — 1)(n — 4° ~ 2th. 
iis iia 


Hence the multiplying constant is approximately n'/+/2zx. Substituting 
R = cos @ in the frequency function divided by this constant, we obtain 


2 “ 6 ‘ ae n—5 : : 
2cos 6 sin" °6 sin 6 dé giving 26" ° d@ as the first approximation. 
Hence the approximate frequency function for the quantity @ in the case of 


periodogram analysis is 


Qn? +/3n 


n} nt 3 
—_ 2 | 6a 
V/ 20 4a 


Thus the first approximation to the probability that @ should be as great or 
greater is 


= 2 n'x'3719"> ao. 


approximately. 
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The approximations which have been introduced have been forced upon us 
by the limitations of the mathematical machinery involved. It must be ad- 
mitted that these approximations are not those which the experimenter would 
choose, for the following obvious reason. If we are testing the null hypothesis 
that the population correlation is zero, for large values of n the sample correla- 
tion will approach its expectation value, namely zero, and we shall in general be 
interested in values of R which are small, and corresponding values of @ in the 
neighborhood of 7/2. This situation is not provided for in this investigation. 
It may be, however, that there exists a large correlation in the population, 
and that owing to the large number in the sample the value of R calculated is 
near this value. Provided that this population correlation is sufficiently close 
to unity, the value of 6 will be small enough to apply the distribution obtained 
above,'and in such a case will enable us to reject the null hypothesis when the 
probability calculated from the distribution is sufficiently small. 
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ON THE APPLICATION OF THE Z-TEST TO RANDOMIZED BLOCKS 
By M. D. McCartry 


1. Introduction. When a series of experiments is performed with the object 
of measuring some quantity, it is implicitly postulated that the quantity in 
question has a “true value,’ which is theoretically obtainable as the result of an 
infinite repetition of the experiment under the standard conditions. In certain 
experiments, especially those of physical and chemical science, the materials and 
the methods employed are subject to such accurate control by the experimenter 
that he can repeat his experiment again and again with the “essential’’ factors 
kept constant, and with biassed errors eliminated. This repetition gives a series 
of observations of the ‘‘true value” in question subject only to random errors. 
All that is needed, usually, to increase the accuracy of the estimate of the “‘true 
value” is to continue the repetition of the experiment. Not only does such a 
repetition make the estimate more exact but it also provides an estimate of the 
degree of accuracy present, permits a comparison between different quantities 
and makes it possible to test various hypotheses as to their relative values. 

In many cases which arise, notably in biological and social science and in 
dealing with data provided by modern mass-production methods, it is a practical 
impossibility to repeat an experiment under the same essential conditions. The 
material available is definitely non-homogeneous with regard to at least some 
of the qualities influencing the results. In testing, for instance, a number of 
varieties of some plant, to find which gives the best yield, it is possible to 
guarantee, that to a high degree of accuracy all the varieties are cultivated alike. 
If a relatively small area is covered by the experimental plots, it can be said 
that all the varieties experience the same climatic conditions and it is not diffi- 
cult to ensure that they are all treated alike as to measurement of produce and 
soon. It is, however, practically impossible to make the plots, on which the 
varieties are grown, homogeneous as regards fertility of the soil and, even if 
this were possible, it would partially defeat the purpose of the experiment which 
is to test the varieties over a certain limited range of soil types. In a similar 
way in many other fields of biological or social experiments a similar non- 
homogeneity of the experimental material exists. 

In experimenting with homogeneous materials, where the conditions of the 
whole series of experiments are the same, the differences which occur between 
the theoretical “true value” and the observations are explained as being due 
to a multiplicity of causes outside the control of the experimenter and of such a 
nature that their incidence varies “randomly” from experiment to experiment. 
It is a fact that certain fundamental factors influencing the results are defi- 
nitely non-random in their incidence which differentiates the experiments with 
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non-homogeneous material from the others and it is by artzfically introducing 
randomization, as suggested by Fisher [1, 2, 3] that such experiments are made 
amenable to the usual error laws. 
For convenience, in what follows, the word ‘‘variety” will be used when 
speaking of a single object of those under test, whether it actually be a variety 
of some plant, a manurial treatment, a method of feeding or anything else of 
the sort. For instance, if five varieties and three manurial treatments are 
being tested in the same experiment, a “variety” would be any one of the fifteen 
combinations of an actual variety under test with a manure. The word “plot” 
will be used for that portion of the non-homogeneous material which is required 
for the performance of an experiment on a single “‘variety,’”’ and the term “yield” 
will be applied to the value of the observed quantity obtained as the result of 
testing a “variety” on a single “plot.”” The plots are, of course, equalized with 
respect to “‘size,”” or whatever similar property would influence the test. 


2. Randomized Blocks. Suppose that there are s varieties to be tested and 
that the necessary replication is attained by testing each variety on n separate 
plots. That the plots on which each variety is tested form a random sample 
of the material available is guaranteed by assigning each of the s varieties to n 
of the available ns plots at random, that is, as the result of a physical random 
experiment with cards, dice, or the like. This method of randomization may 
be so employed that no restrictions are put on the plots to which the varieties 
are assigned, or it may be further refined in different ways so that, while pre- 
serving the random nature of the assignment, certain restrictions may be placed 
on it. Such a method of randomization with restrictions is the method known 
as ‘‘randomized blocks.” 

The basic idea is that compact “blocks” of the non-homogeneous material are, 
probably, much more uniform than the material taken as a whole. Conse- 
quently, the material is first divided into » such “blocks,” as compact and 
uniform as possible, each block containing s equal plots. Each of the s varieties 
under test is assigned to a single plot in every block and randomness is attained 
by making the assignment of the varieties to the plots in each block as the 
result of a separate random experiment. Thus the 7 plots to which each variety 
is assigned do actually form a random sample of the non-homogeneous material 
with the restriction that to each plot of any variety corresponds a plot of any 
other variety from the same block. 


3. Mathematical Formulation. X j,i) denotes the ‘true yield” of the kth 
variety which would be obtained by testing it on the lth plot in the jth block. 
k = 1, 2, --- ,s denotes the number by which the variety is known, | = 1, 

. , 8 the order-number of the plot in the block and j = 1, 2, --- , n the 
number of the block. Following Neyman [4, p. 110] we define the “true yield,” 
again with particular reference to agricultural experiments, as: 

“Suppose that the experiment is repeated indefinitely without any change of 
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vegetative conditions or of arrangement so that the kth variety is always tested 
on the plot (j, 1). The yields from this plot will form a population, say I1ju, , 
and X jx) is defined as the mean of this population.” 

Thus, in any block, there are s° different possible populations with corre- 
sponding “true values,” but in any single experiment on that block observa- 
tions will be obtained from only s of the s° possible populations. To distinguish 
those populations for which an observation is available from those which are 
entirely hypothetical X ;, will denote the “true yield,” as already defined, of 
the kth variety on the plot to which it has been assigned in the jth block. Since 
this assignment has been carried out as the result of a random experiment the 
“true yield”’ is itself a random variable; X ji.) is randomly selected from the set of s 
possible values X jiu), X jou, +++ X ise - 

Using the dot notation to denote the mean of a quantity taken over all values 
of the letter replaced by the dot, it is clear that 








X juny = Xo. + [XG — Xow] + [Xiaw — Xi-c] 
= X ..(k) + Bi. + Ujzl(k) 5 






and 








X ja) = Xu) + (Xj. — X.w] + [Xi — Xia] 
= X ..(k) + Bix + nik, 





where 

















X 5.) a X ..(k) 5 Ujl(k) — X juke) i X 7+(k) 
and 





Nik = X jh) — X 7-(k) + 





Obviously 


tH Bix = 0 and 7 Uk = 0 
j=1 


l=1 
from their definitions, while 7;, is a random variable, with zero expectation, 
selected from the sequence Uji, Ujeu -+: Ujsay. Neyman (loc. cit.) calls 
nik , thus defined, the “‘soil error’’ of the kth variety when tested on its assigned 
plot in the jth block. The actual yield observed when the kth variety is tested 


on its assigned plot in the jth block is x;, and the difference 2; — Xja) = 
€x is termed the ‘technical error.” Clearly 
(1) j= X..@ + Bit nin + €x- 


Both “soil error” and “technical error” enter into any comparisons which may 
be made and it is well known that the major source of error in, for instance, 
agricultural experiments is that due to the heterogeneity of the soil. As regards 
the relative magnitudes of the two errors, that of course depends on the experi- 
ment in question, but Fisher [5] has stated that in an agricultural uniformity 
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trial (i.e. when the same variety is tested on all the plots) yields from plots of 
1/40th of an acre frequently vary sufficiently among themselves, owing to soil 
heterogeneity, so as to give a standard deviation of ten per cent of the mean 
yield, while the inevitable random errors in treating the plots can be kept down 
to a much lower figure. By confining the randomization to a “block’’ of the 
material, which comprises only a relatively small compact portion of the whole 
material under test, the effects of soil heterogeneity may be much decreased. 
It appears, however, that it may very often be an unwarranted simplification to 
consider that the “‘true yield” of a variety is the same for all plots of a given 
block. 

The two types of ‘error’ are random variables of altogether different proper- 
ties. Both have zero expectation and may be considered as independent of one 
another in the probability sense. It, therefore, appears reasonable to assume 
that €;, is independent both of the ‘‘technical error’ in any other observation 
and of the 7’s. On the other hand 7; is a random variable selected from the 
sequence 


(2) Ujick) » Uzjack) » + °° 9 Ujacky 


and since, if 7; has the value uj.) and jm is free to assume any one of the values 
Wji(m) » Uje(m) 5 + * 5 Ujecm) Except Ujxm), It is clear that 7; and 7;» are not inde- 
pendent. In the case of nj, and 7;--m where j’ 5 j’’, the random variables are 
drawn as the result of two separate random experiments from different sequences 
of the type (2). Obviously this means that the “‘soil errors’’ for different blocks 
are independent for either the same or different varieties. Writing F for the 
expected value, or the mean value in repeated experiments, since 


s 8 
Do Un = Dy Ujum = 0, 
l=1 l=1 


° ‘ 2 ° 
the variance of 7, is o;;, with 


(3) Cans = 2. Ui) 


l=1 


and also 


(4) E [njx Nim] a {s(s a 1)}~" dX Ujl(k) Ujl(m) - 


Using (1), (3) and (4) it follows that 
(5) Elaj) = X..@ + Bu = ay, 
say, 
(6) E[(xix — aj)"] = El(njx + €:x)'] = o4, + 0%, 
E[(xjx — Qj) (tim — Ajrm)) = El(nine + €ie) (nim + 57m) 
= Elajn nin). 
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The expectations of the various product terms on the right-hand sides of these 
equations vanish except in the case of the last one. If j’ 3s j’’ it too vanishes, 
whatever values of k and m, and it follows that the correlation of the observed 
yields of any two varieties, or of the same variety obtained from different blocks 
is zero. It is clear, however, that such is not the case when the yields are ob- 
tained from plots on the same block. Denoting by pjiim) the coefficient of 
correlation between 2; and xjm and using (4) 


8 
i Ujl(k) Ujl(m) 
= i=1 
7 7\).2 ow? 18.2 4.218? 
s(s i 1) {on;, > oe;} (sn + oe; } 
when k & m, while, of course, pjux) = 1. 
It may be noted that even when two sequences such as in (2) are identical 
the correlation pjijm) is not zero. In this case, when the varieties react in 
— ° oye o.n s 2 2 2 
exactly the same way to variations in fertility within a block, o,,;, = o7;,, = 0; ; 
say, and 


(7) Pi(km) = Pi(mk) = 


(7a) Pikm) = —(8 — 1y“{1 _ jie. 


Then the coefficient of correlation is negative and depends only on the relative 
magnitude of the technical and soil errors for the block in question, and on the 
number of plots in the block. In a given block it is greatest in absolute magni- 
tude when the technical error is zero, or at any rate negligible with respect to 
the soil error which, of course, is usually uncontrollable. In order to have zero 
correlation between the yields of every pair of varieties it must be assumed 
either that (a) there is such a complete lack of relationship between the ways 
in which the various varieties react to the differences of fertility within a block 


that for each pair of varieties k and m all the products such as 2, Camp Mace) 
l=1 


vanish identically even though the wu’s themselves are not zero, an assumption 
that lacks plausibility, or else that (b) all members of each sequence of the 
type (2) are zero. This latter assumption means that no allowance whatsoever 
is needed for variations of fertility within a block. Once variation of fertility 
within a block is admitted it appears only reasonable that it should be taken 
into account and the effect of the resulting correlations on any test concerning 
the yields of different varieties examined. 

Cramér has shown [6, 7] that if the sum of two independent random variables 
be normally distributed each variable must itself follow the normal law. 
Strictly, therefore, it cannot be correct to apply normal theory to the random 
variables xj, in the mathematical model elaborated above, for, though e€;, may 
readily be assumed normally distributed, 7;, can obviously take only a finite 
number, s, of values and, consequently, as its distribution cannot be normal, it is 
impossible that x; can be exactly normally distributed either. However, as a 
first approximation, taking into account the correlations, it will be assumed 
that the yields from any block form a set of single observations of the variables 
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in an s-variate normal distribution. Further, for the sake of simplicity, it will 
be assumed that the variances and covariances of the populations appropriate 
to the different blocks are the same. Dropping the distinguishing j’s, the 
variances of the yields, as in (6) are defined by o; = os, ito ; and pxm is written 
for pjcem) in (7). We define y;, and Azm by 


(8) Yik = Lik — Aj 
and 


(9) hing = Bas 

2or0mA 
where A is the s-rowed determinant | pem |, symmetrical about its principal 
diagonal, A;m the cofactor of pym in A and A is written for | Ax, | the deter- 
minant of the positive definite matrix || Azm ||. Then since the interblock 
covariances are zero, the elementary probability law for the whole set of ns y’s 
is given by 


(10) Ply} — A“ ™ exp { = 2 a p Yk Yim \ . 
7 +m 
It may be noted that 7 and / where they occur run through all integral values 
from 1 to n while k and m take values from 1 to s. A sign such as >, means 


k,m 


that the summation is taken over all the pairs of values of k and m the term 
(m, k) being taken as distinct from the term (k, m) and including the terms in 


which k = m. > implies a similar summation with the omission of terms in 
km 


which k = m. 

The distribution law (10), or similar one substituting the 2’s for y’s from (8), 
takes into account also cases in which, though the correlations may be zero, 
the variances of the different variety yields differ. 


4. The Z-Test. If {x,},q = 1, 2, ---fi, isa set of f; mutually independent 
random variables each of which follows the same normal law with zero mean 


Ji 
Fa 2 ° 2 . . . . 
and variance o; , and if uw = 7 x,, then the distribution law for uw is, uw > 0, 
q=1 


(11) p(u) = {a /Taf)ju ie 
f2 
with f = franda = 20}. If also uw = Z. y; , where {y,}, r = 1, 2, --- fe, 


r=1 

is another set of mutually independent random variables each of which is nor- 
mally distributed with zero mean and variance o, then the distribution law of 
us is (11) with a = 203 and f = fe. If, in addition to the independence of 
the variables within each set, there is also independence between the sets, 
then u; is independent of us and the distributions of different functions of 4 
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and ue used as criteria may be obtained. 
connection was z, defined by, 


z= $ loge (fi U2/ four) — loge (2/01) 


and its distribution law is [8, 9, 10] 


bf1 pife 1 ™ 
(12) “ly = fi fT ih + fdle’ 


The one originally proposed in this 


TESTES) (fi + freee 

Any other single-valued, monotone function of we/u; would when o; = o2, 
as a criterion, be equivalent to z. F = e” = fyue/fou, v = Ue/um and w = 
uo/(u, + ue) have been adopted as criteria and their distribution laws are 
readily deduced from (12). All these criteria are equivalent in providing con- 
trol of “errors of the first kind” [11, 12], that is, the risk of rejecting a hypothesis 
tested when true. As usual the procedure is to select arbitrarily in advance a 
certain “level of significance,” say « = 0.05, 0.01 etc., and, assuming the hy- 
pothesis tested is true, to determine the value of the criterion, say the value 
zy of z, such that 








(13) Piz>m|H\ =[ p@) de =« 


If the sample of observations gives a value of z > 2 H is rejected, if z < 2, 
H is accepted. It is merely a matter of convenience which of the criteria 
z, F, u or w is used and tables are available to facilitate numerical work. Tables 
for z and F are given by Fisher [2], Fisher and Yates [13] and Snedecor [14], 
while for w Tables of the Incomplete Beta Function [15] may be used. 
Though no tables are directly available for v it is the simplest to use in theo- 
retical discussion and in subsequent sections it is its distribution law, and not 
that of z, which will be considered. The latter may, of course, be readily 
deduced. 

Considering the distribution law (10) with yj, replaced by xj, — aj, when 
Pim = O and o, = om = a, i.e., all the observations are normal and independent 
with the same variance. Writing 


(14) uy = 7 (x 5% —- 2.& — Ze + z..)’, 

i 
(15) ig = 2», (x, —2..? =n x (x. — x..)’, 
(16) us = a (x; —2..)?=8 » (x;. — z..)’, 


then it is readily seen that 


Uy + Ue + us = = (aj, — a..)°. 
ide 
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Now if a; may be put in the form M + B; + V;, with > B; = “Vi = 0 
j k 


then wu; is distributed as in (11) with f = (n — 1)(s — 1). If, in addition to the 
additive assumption, V; = 0 for all values of k then uz follows the same law, 
independently of uw, and with f = s — 1. Similarly if B; = 0, for all values 
of j, us has the same distribution law with f = n — 1. It may be shown [16] 
that if a;, = M for all values of j and k the three quantities uw; , ue and us follow 
independently the law (11) with suitable values for f, and then the corresponding 
values of z follow the law (12). 


Making the assumption of additivity for a;,, of which, incidentally, the 
correctness or adequacy cannot be tested without more than one set of ns 
observations of the variables, the z-test may be used to determine whether or 
not there is a “block effect”’ or a ‘‘variety effect,” i.e., whether B; = 0 or V; = 0 
for all values of 7 and k. For instance to test the hypothesis V, = 0, k = 1, 
2,---,8,2 = 3 log. {(n — 1)ue/m} is calculated from the observations and 
the hypothesis is rejected if z > z where 2 is found from Fisher’s tables corre- 
sponding to a suitable value of ¢ in (13). Otherwise the hypothesis is accepted. 
This is the usual method of applying the z-test to randomized blocks. 

The problem before us now is to consider what happens to such a test when 
o; ~ Om and prm * O in (10), and the hypotheses to be tested must be related 
to (1) and (5). As already stated this method of testing hypotheses controls, 
at a suitable level, the risk of rejecting the hypothesis when it is true. A 
complete examination of the application of any criterion as the test of a sta- 
tistical hypothesis should involve, also, investigation of “errors of the second 
kind,” i.e., the risk of accepting the hypothesis when some alternative is true. 
That is to say such an examination should involve a study of the “power func- 
tion of the test” [17, 18, 19], and this would require a knowledge of the proba- 
bility distribution of the criterion when the hypothesis tested is not true. In 


this paper, however, attention will be confined entirely to “errors of the first 
kind.” 


5. Hypotheses Tested. In order that 

(14a) uw = >, (ye — yr — Ys t+ Yy..)” 

4. 
and 
(15a) U2 — _ (y.. — y..)” 

k 
may be true it is sufficient that 
Ajk — A.zt — Qj. +a.. = X 5.(k) = X..«) = X 7.(.) + X... 

and 


= X ..) — Riis 
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must both be zero in every case. It has been suggested by Neyman [4] that it 
would be desirable to test the hypothesis that X..q) is independent of k, i.e. 
that the average of the true yields over the whole field is the same for all 
varieties. He suggests that the variations in the responses of the different 
varieties within the field are relatively unimportant so that, while allowing for 
the effect of the variations in fertility within the field on the various distribution 
laws, it is the average over the whole field which should be tested. The func- 
tions uw, and wuz will not test this hypothesis for, in order that they may have 
the same expectation not only must X..¢) be independent of k but also X ;. x) 
must be independent of k for every 7. Consequently one of the hypotheses 
tested here is that Xj.¢. = Xj...) , and therefore, of course, X..¢) = X...), for 
every j and k, i.e. that the mean of the true yields over all the blocks is the same 
for all varieties while, by using (10), we make allowance for the variations in 
fertility over each block and for the resultant correlations introduced. We shall 
not consider us; from (16) as we are interested only in the presence or absence 
of a ‘‘variety effect.”’ 

It appears that two other hypotheses lead to results which are particular cases 
of the above. If we test whether the true yield on every plot is the same for 
all varieties, i.e. that Xj) is independent of k, then, assuming the hypothesis 
tested is true, the varieties all react in the same way to the variations of fer- 
tility within each block and in (10) o. = om = o, say, while pym = p. On the 
other hand if we neglect all the variations in fertility within each block all the 
correlations vanish and o; = om = o,. The hypothesis tested then is that 
either X jx%) or, what is the same thing, X;.«) is independent of k. 

It does not appear that the assumption of normality need cause any difficulty. 
E. S. Pearson [20] has examined the effect of skewness on the parent popula- 
tions and by carrying out sampling experiments has concluded that even with 
skew populations ‘...it seems probable that the more elaborate forms of 
analysis of variance are also of fairly wide application, provided that the number 
of degrees of freedom apportioned to the residual variation is not too small.” 
A further investigation by Eden and Yates [21] was also designed to test the 
effect of skewness, but the negative result there obtained was to be expected 
owing to the amalgamation of the observations into groups. It appears that 
the effect of skewness in the original populations will not have very much 
effect on the distribution of z. 

Welch has examined [22] Randomized Blocks and Latin Square experiments 
from the “‘randomization” point of view. In the case of randomized blocks, 
in terms of the notation used above, he has taken e;, = O or, expressed in 
another way, he has assumed that the actual observed yield in any plot is the 
“true yield” on that plot of the particular variety tested on the plot. The 
hypothesis he is then testing is that Xj, , or, what is the same thing for him 
jx) is independent of k. Taking the (s!)" different ways in which the varieties 
may be tagged on to the different yields he has considered the (s!)" different 
values of what we have called, w and he has compared the finite discrete distri- 
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bution so obtained with that given by normal theory. Getting E(w) and o*, 
from the finite distribution, he fitted a Pearson Type I curve in a number of 
examples and found that the 5 per cent and 1 per cent points in his fitted curves 
did not differ much from the corresponding points of the normal distribution 
of w. His theoretical discussion showed, however, that if there is too much 
discrepancy between the variancies in the different blocks the randomization 
test may seriously underestimate the significance of any differences between 
the varieties as compared with normal theory. 

It was Neyman [4] who first pointed out that, when the variations of fer- 
tility within each block are taken into account, the correlations between the 
observed yields should be allowed for, and the method adopted here is a de- 
velopment of his point of view. A number of authors, however, while agreeing 
that such variations of fertility do occur, hold that this does not seriously affect 
the distribution of z. 


6. Distribution of wu, and uw. As already stated, it is the distribution of 
v = U2/um which will be sought, not that of z, where w and we are defined by 
(14) and (15), or rather by (14a) and (15a), since the hypothesis tested is 
assumed true. Writing 7 = »/—1, the characteristic function of the simulta- 
neous distribution of wu; and uz, that is F [exp {7(tyw: + teue)}], is found from (10). 

From (14a) and (15a), by straightforward expansion, using the conventions 
already explained for Zz. ; = etc., we get 


km km 


uw = (ns) [(n — 1)(s — 1) dt yz — (n — 1) x 2! Yik Yim 


Sem 


—(s- Dd z& Yik Yur > 


pl k 


us = (ns) [(s — 1) ke vin + > yxy) — X 2 Yik Yim — Yir Yon] ; 
ik E 


pl ‘Mm jpxl kym 


Do yikYin] ; 
x 


m 


and using these expressions with (10) the characteristic function of u; and ue is 


Puyug(ti, te) = fet Plyix}-exp {t(tiu + teu2)} dY 


ae f exp p2 2» Bix, mYie Yim} AY 


_ ym 
where dY = II dy ;, and the integral is an ns-fold one taken over the whole space 
jk 
of these variables. Bx. is defined by 


(17) Bjxim — 61. Akm = 2(Sbim =F 1)[4i(n6 j, = 1) + to|/ns 


where the 4’s have the usual meaning being equal to 1 when the suffixes are the 
same and equal to zero when the suffixes are different. This integral, since the 
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real part of Bjx,2m is positive definite, may readily be evaluated (23, 24] and 
gives 


(18) Puy .uy(h ’ te) —_ A’"/B! 


B being the ns-rowed determinant | Bjx,1m |. 
The determinant B may be written in the form 


{P] [Q] 


where [P] = [pim] = (Bix, jm] and [Q] = [gem] = [Bjx.zm] and there are n° such 
arrays in B. This gives at once B | Dim + (n — 1)Qum|-| Pim — Quem |", 
whence on substitution 


(19) B = | Aim — ite(Sdim — 1)/8|-| Atm — tti(86em — 1)/s |". 


The two determinants in (19) are identical, with t and f, interchanged, and 
are readily reduced to symmetrical (s — 1)-rowed determinants by: (a) Adding 
to the terms in the last row the corresponding terms in the other rows and 
repeating for columns, (b) Multiplying the terms in the last row successively 
by M;./M (k = 1, 2, 3, --- ,s) and subtracting from the corresponding terms 
in each of the other rows, with 


8 


(20) M=>Aw and M=>M,= DY Aw. 
1 


m=1 k=1 k,m= 


The following operations then reduce these (s — 1)-rowed determinants to ones 
which are symmetrical and contain ¢t’s only in the diagonals: (i) To the terms 
in the last column add the corresponding terms in all the other columns and 
repeat for the rows, (ii) Multiply the terms in the last column by (4/s + 1) 
and add them to the corresponding terms in each of the other columns, repeat 
for the rows, (iii) From the terms in the last column subtract the sum of the 
corresponding terms in the other columns multiplied by s’, repeat for the rows, 
(iv) Divide the last row and the last column by s’. The determinant then 
becomes M/s. |C — itl | where || C || is the matrix || cz ||, J the unit matrix 
and 


Ckm = Cmk = Akm = (V/s + 1) "(Ake + A one) + (V/s + 1) A. 
~ M~{M, — (/s + 1)7M] [Mn — (0/8 + 17M. 


It should be noted that henceforward k and m run through integral values 
from 1 to s — 1 only unless the contrary is specifically stated. 
Thus it follows that 


Pur.ug(t » 2) = (As/M)" | C — it | HOP. | C — tte |. 
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Putting C = | crm | and noting that ¢.,,u.(4, &) = 1 when 4 = tf = 0 clearly 
As/M = C and the characteristic function factors into the form ¢., (4) -¢u,(t), 
where 


(22) ¢u,(h) = CY | C — a | te 
(23) Gu,(te.) = C*|C — itt |. 


This demonstrates that uw, and we are stochastically independent and that the 
correlations introduced by allowing for the variations in fertility within the 
blocks does not affect the independence already demonstrated [16, 25]. 

|| C || being a square positive definite matrix of rank s — 1, its characteristic 
equation |C — AJ | = 0 must have s — 1 real positive roots. It follows that 
| C — cI | must factor into s — 1 factors of the type a-it where ais areal positive 
constant. Some or all of these factors may be equal and various combinations 
of factors of different multiplicity are possible depending on the value of s. 
Only two cases will be considered here: (a) when all the roots of the characteristic 
equation of || C || are equal, and (b) when all the roots of the characteristic 
equation are unequal. 

Suppose that all the roots of the characteristic equation are equal, say to a, 
then | C — itl | = (a — it)’ andC = a*" giving 


(24) Pu, (th) =— rrr rs _ wwe 
(25) Pu. (te) = at? (g a ay. 


It is seen at once that wu, and uz are distributed as in (11), fi = (n — 1)(s — 1) 
and fe = s — 1, and thus z or v follow the usual distribution laws. 

Clearly when the variations of fertility within each block are neglected and the 
hypothesis tested is that X ju), or X;.«), is independent of k, the roots of the 
characteristic equation are all equal. Then there is no correlation, o, = «,, 
Ar = (20°)* = cxx = a, Arm = 0 = Cum(k %= m) and the usual results are 
obvious. 

On the other hand when allowing for the variations of fertility within a block 
while testing the hypothesis that X jy) is independent of k, the variances and 
covariances are all equal, i.e. o; = a; + of = 0, px = 1 and pim = p = 
—oi/{(s — 1)(o, + 0%)},k =m. This gives 


Arm = [{1 + (8 — 1)p}dem — p](1 — p)*, = {1+ (s — I)p}(1 — »)™, 
Anm = {1 + (8 — 1)dcm — p}/20°A, A = 1/(20°)*A, 
Ckm = 5im{2o°(1 a p)}, C : {20°(1 —_— _— 


where, as usual, zm = : oom . From this it follows the roots of the char- 
O km 


acteristic equation are all equal, a = {20°(1 — p)}~* in (24) and (25). Thus in 
this case also, the z-test or its equivalent gives exact control of errors of the first 
kind. There is, however, this difference that w/(m — 1)(s — 1) and ue/(s — 1) 
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are to be considered not as estimates of o but as estimates of o (1 — p) = 
(s — 1) "{s02 + (s — 1)03}. 

When s = 2, even though the variances differ, since there is only one root of 
the characteristic equation a = (¢] + 03 — 2po,02) * the characteristic functions 
are of the form (24) and (25). Consequently, in this case, s = 2, when only two 
varieties are tested for the hypothesis that their average “true yields’ are the 
same on each block then, even though the varieties may react in different ways 
to the fertility levels within the blocks, granting normality, the usual z-distribu- 
tion applies. This, of course, includes the case when even though p may be 
zero the variances differ. «u,/(m — 1) and we are to be considered as estimates of 
Ligi + a2 — 2pajo). 

Proceeding next to the case in which all the roots of the characteristic equation 
|C — AI | = O are unequal, the roots are, say, a, < a2 < --- < as_; where, of 
course, all these quantities are real and positive. This case will arise in testing 
the hypothesis that the yield for each variety is the same for every block, that 


Xj.u) = X;.cm, While allowance is made for the different responses of the 
varieties to the differences in fertility within the blocks. The mathematical 
formulation would be the same even if there were no correlations but the vari- 
ances were different for the different varieties. Then we have [26, 27, 28] for 
both wu and we from (22) and (23) 


~ 


(26) p(u) = C™(2r) [ ee II (a, — it)]~" dt 


with m = 34(n — 1) for u,and m = $ for ue. 

Replace t by the complex variable z and integrate round the contour shown in 
Fig. 1. This contour consists of: (i) The real axis from +R to —R, where 
R > as_1, (ii) The quadrant | z| = R, x < argz < 37/2, (iii) The imaginary 
axis from A[—7R] to B[—(a; — r)i], cutting out the singularities by small semi- 
circles of radius r, as shown, (iv) The imaginary axis from B to A, as in (iii), (v) 
The quadrant | z| = R,3a/2 < argz < 2x. Within this contour f(z) is analytic 
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and hence the contour integral zero. It may also be readily seen that the 
integrals over the two quadrants tend to zero as R increases, and by examining 
the changes in the amplitudes of (a, — iz)", k = 1, 2,---,s — 1, as the 
contour circles the points —ia;, —ta2 --- , it will be seen that the integrals over 
the straight lines between (—iae, —ia3), (—ias, —tas), --- cancel whether m be 
half an odd or half an even integer. Then 


(26a) p(u) = C"(2r)" x | eT (a, — iz)|~” dz. 


The contours D, are those shown in Fig. 2 and consist of ““dumb-bells”’ encircling 
the points (—7ta,, —ia,,1), r = 1, 3, 5, --- , in the negative direction. If s is 
even, the last integral consists of only one half the “dumb-bell” extending to 


x! ° x 


“1d, 


-tdhe 


Fic. 2 


—ic. It may also be noted that if m be odd and so m an integer, the other 
straight line integrals, those of the ‘““dumb-bell’’ contours in question, also cancel 
and leave only the contributions of the small circles about, what are now, 
the poles. 


. s—l 


Now we put zz = w and @,(w) = II (x —- w) ” omitting the terms contain- 
k=1 


ing a, and a,+;, it follows that ®,(w) is analytic in and on a circle of centre 
4(a, + ari) which contains the points a, and a;4; but not a, or aig. Thus 
the function @,(w) may in the interior of this circle be expanded as a uniformly 
convergent series in terms of 3(a@, + a;41:) — w giving, r = 1,3, ---, 


(27) &,(w) = z Arp { (ar + aryi1) — wh”. 


p=0 
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Since termwise integration is then permissible it is necessary to consider only 
integrals of the form 


iz = if {3 (a, + Or41) — wi?e*” dw 
Cr” = eo ae 
Dy 


(a — uw) (arn — wl’ 


/. . . . . . . ee . . 
where D, is a contour similar to D, but circling instead in a positive direction the 
points a, and a,4; on the real axis. We then have 


(26b) p(u) = C"(2x)" >> > Reyd wp 


r p=0 


Now if 


eo dw 
J,= if , 
pr \(ar — w)(ar41 — w)}™ 


it is clear that J,, is obtained by applying the operator {Hee + Or4i1) + 2 
u 


p times to J,. Now putting w — 3 (a, + ari) = (ar41 — a,)t, it follows that 


emi Hara — a) 


— dt 


+ —hula;tar+1) (1+,—1+) _—}ut(a,+1—a,) 
_ 1€ | ¢ 
e Tr oe a - 
} me (t? a 1)™ 


and this gives [29, p. 171], 


-} hut +1) 
mD(Z)u™ ert Ta 3U(ar41 — ar) } 


2-1 T(m) (3 (r+ — a,)}™— 


J, 


I,(z) is the Bessel Function of purely imaginary argument defined, 
2 < 2m, by 


0 


(28) Le) = % 
Wer orlPla +r + 1) 


Hence it may be found that 


2nT (5 ~) a” - 
- (2) dulartor+s) fu” Tins {duos ee a,) }] 


J — . 1 
"> — T(m) (aru. — a) du? , 


and this gives 
1 


c"r(3) oe telartar+y) Pa a? bas 
2 — Zz. Arp fu” "i. {du(ars _ a,) }] 


29) Po) = Ton) S ena man 25%? Gur - 


where a,, is defined by (27) with m = 3(n — 1) for u,and m = 3 for ue. 
In the case s = 3 there is only one “dumb-bell’’ contour and #,(w) = 1, so 
that we get, for wu, we > 0. 


(a a)*"? 1(3) —}u(ajt+ao) 4(n—2 
(30) p(w) = (az — a) rim = 1) eter) yt Tyna {Ft (a2 — a1) } 


(31) p(w) = (cry ove)? e226 *) Ty { Bua (are — a)}. 
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It may be noted that if the series in (28) is substituted in (30) and (31) these 
distributions may be considered as the sum of an infinite number of x*-distribu- 
tions all with the same o” but with different degrees of freedom. It may also be 
noted that with a; = ag all the terms, except the first, vanish and thus a single 
x -distribution is left. 

When s is even the last contour is one from + © circling a,_; negatively. 
Using Hankel’s integral for the Gamma-Function, putting w — a,_; = ¢/u 


(as—1+) (0+) 
I= if e "(a1 — w) "dw = igen tyet | (—¢) "ef dt 
co co 


20 —Uas—;  m-—l 


~ T(m) 


Denoting by D differentiation with respect to w under the sign of integration 
—l . . . 
and by D the corresponding integration from zero to u, 


(as— 1+) 
D?I =i | a ee 


Then we can write 


s—2 s—2 
I in +g * = (~gyp at (1 — ax/w)™ 
ae c=1 


0 
}e Gack -— eer 
p=0 


the expansion being justifiable since | a,/w, <1. Since (s — 2)m is an integer 
the additional term to be added to (29) to give p(w) is, therefore, 


(32) Ot ag Orrin eri. 


p=0 


7. Distribution of v = we/u,;. Though the distributions of uw, and ue have 
been given in a rather complicated form for any value of s when the roots of 
|C — I | = 0 are all unequal, the distribution of v is given only for s = 3. 
In this case, since wu, and we are independent, from (30) and (31) 


p(u Us) si (a1 a2)"" (3) ; pe Sion begins tm) 
j (a2 — am) PEZ(n — 1)} 


To{ U2(a2 = ot1) } Dyna) { Fur (ae = at) } 


Now making the transformation u; = u, Ue = uv with a( 1, U2) = wand putting 


a(u, v) 
the exponential term 


f > 
e wtiteiter tes) — aes nah = K3{Su(1 + v)(ay + ae) }, 
3 
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then on integrating with respect to u over the whole range of variation, from 
zero to infinity, we get 
$n 4 @ 
(a1 ae) {(1 + v)(ar + a) } 


cae = —_— tae Mn—-l) 7 51 — 
p(v) on = a) Pf A(n 1D} So uU To {4uv(ae a)} 


Tym 2) {4u(ae sa ay) } Ayf{su(l + v) (ay + Qo) } du, 


K:(z) being the modified Bessel Function of the second kind. 

This integral is a particular case of one investigated by Bailey [30, 31] and it 
gives 

(n — 1) (a, ay)?” 9 9 9 9 ° 

33 v) = Fy{3(n + 1), 3n31, 3n; 8/1 +0)°,8°/A +2) } 
(33) p(v) (2(1 + v) (a, + a)” s{3(n + 1), 2 ,3n;Buv/(1+v),6/A+v)} 
where B = (a2 — a)/(a2 + a) and F's is Appell’s fourth hypergeometric function 
of two variables [32]. 

On performing a similar integration when s > 3, p(v) may be obtained as a 
rather complicated series of terms similar to (33). 


8. Approximate Moments of the Distribution of v. As the distribution of v 

is complicated even in the simplest case of s = 3 it appears advisable to examine 
s—1 

its moments even though only approximately. Writing S, = 7 a, and putting 
k=l 


s—l s—l 
I] G - t/a.) = exp {—m log (1 — t/a) 
k=1 \ k=1 


exp \z mSt /rh, 
r=1 
k, being the r-th semi-invariant of u, we get 
k, = S,m.(r — 1)! 


Thence the first four moments of u about its mean are 


“a= ms, , ke = mSe 
bs = 2mSz3 . Ma = 3m} 2S, a mS3}, 
m being 4(n — 1) in the case of w and 4 in the case of ue . 
Now to get approximate moments for v we write — = (uw — %)/% and » = 
(ue — te) /i2 and, expanding in terms of £ and 7, obtain 
v= h/h-{Lt+n-E-—&H+ E+ En — ---}. 
This gives, M, being the r-th moment of v about the origin and writing 7, = 
s—l s—l - 
n= Ea /(Za) 
k=1 k=l 
M, = (n — 1) "[1 + 272(n — 1)" — 873(n — 1)” 
+ 12{47, + (n — 1)T3}(n — 1)° ---], 
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(n — 1) *(1 + 272)[1 + 672(n — 1) ' — 3273(n — 1)° 
+ 60/47, + (n — 1)T3}(n — 1)°... 
M; = (n — 1) °(1 + 672 + 873)[1 + 1272(n — 1) — 8073(n — 1)” 
+ 180/47, + (mn — 1)T3)}(a — 19°... 
M, = (n — 1) *{1 + 127. + 3273 + 12(47, + T3)}[1 + 207T2(n — 1) 
— 16073(n — 1) + 420{47, + (n — 1)T3}(n — 1)™ ---]. 


The moments around the mean may readily be found if needed. If the a’s 
are all equal 
ae” ws PGh — Nh +7) 
Vf) 1 (3f2) 
from the known distribution of the ratio of two x”’s with fe = (s — 1) and 
fi = (n — 1)(s — 1) degrees of freedom respectively. Then developing M/ as a 
series in terms of f;’ and fz" 


Mt = (fo/fi\(1 + 2fr' + 4fr? +--+), 

M2 = (fo/fi)"(1 + 2fe')(1 + 6fr' + 28/7" + ---), 

M3 = (fo/fi)°(1 + 6fe! + 8f2°)(1 + 12f7' + 100f;? + ---), 

Ms = (fo/fi)*(1 + 12fe" + 44f2" + 48fe°)(1 + 20f7' + 260f7° + ---). 


It is then easily seen that the difference between these moments and those of v 
. — i a ail ° 
when the a’s are unequal is due to the deviation of 7’, from (s — 1)" ", the value it 
would have if the a’s were all equal. 


9. Numerical Illustrations. The distribution of v has been obtained in 
workable form only when s = 3 and, consequently, it is only that case that is 
considered here. In equation (33) the variable terms in the Appell function all 
contain B = (az — a)/(a2 + a) and it is this fraction or, perhaps better, its 
square which measures, in a sense, the deviation of the distribution of v from the 
usual form. There are, therefore, two stages in this examination. It will first 
be investigated how 8 changes with the correlations and variances; and then the 
changes in the “levels of significance’’ due to differences in B will be examined. 

Using (9), (20) and (21) it will be found that the equation | C — AJ | = 0 for 
s = 3 becomes 4p\” — 4g + 3 = 0 with 


22 22 22 

P = 020341 + o30;Ac2 + o102A33 + 2010203(01A03 + o2As, + os), 
2 2 2 

q = a1 + o2 + 03 — 6203023 — 0301931 — 9102/12 , 


where, of course, A = | pxm | and Axm is the cofactor of pymin A. This equation 
may readily be solved giving a and ae. 
r . . . 2 2 2 2 
Taking first the case of zero correlation and putting o; = kyo’, 02 = keo’ and 
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o. = kyo’ it will be found that while a; and ae depend on the k’s and on o the 
fraction B = (ae — a;)/(ae + a) depends only on the k’s. For different values 
of the k’s the following table shows the values of 6’. 


TABLE 1 


7 ” ° 2 2 2 y . 
Values of 8° of different values of 01 : 02:03 = ki :ke:ks. No correlation. 


ky Ce C 6? 


.003 

037 

.083 

set 

177 

221 

250 

. 250 

250 

529 

0.790 
(N — 1)2/(2N + 1)? 
(N — 1)?/(N + 2)° 


It is clear that to get a considerable value of 8’, one standard deviation 
must be at least three times the other two. It also seems to produce a 
considerably larger value of 6’ to have one large k and two small k’s than to have 
two large ones and one small, with the same order of magnitude of the ratio 
large to small. Furthermore, when the ratios of the o” are 1:1:N the limit of 
6’ as N increases is 1, while if the ratios are 1 : N : N the limit is 0.25. 

Examining now the definition of p,m , omitting the 7's in equation (7), we find 
that pzm can be written in the form 


Pim = —Tum(S — 1) "[(1 + 02/03,)(1 + 02/o7,,)] 
where 


s p Uji(k) Ujl(m) 
l=] 
rim = — aaumienm 


Tne Tam 


and 7m is itself a coefficient of correlation, i.e., the correlation between the 
true yields. The second part of pyzm depends on s and on the relative magnitudes 
of the soil error and of the technical error. Its maximum value, in the case of 
s = 3, is 0.5 which occurs when the technical error is small with respect to the 
soilerror. If both types of error have the same variance then the second term is 
0.25. There appears to be no data available which enables us to assign values to 
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Tkm , SO the method adopted is to choose some values of rm which appear likely 
to affect seriously the value of 6° and then to take the second factor equal 0.5. 
If the values of pxm are all equal and the variances are also equal the normal 
theory has been shown to apply, and hence these values are taken to differ 
considerably. Table 2 shows the effect on 8” of taking different correlations with 
various values of oj : 03 : 03. 

It is clear from the table that if there exist correlations of the order of magni- 
tude of those assumed, they can cause the distribution of v to deviate considerably 
from that which arises on the usual theory. For instance, if the variances are 
equal the value of 6° may be 0.444 a value it would attain if, with no correlations, 
one variance was seven times the other two. Taking the cases in which oj : 
os :03 = 1:4:90r1 :9:16 the value of 8 with no correlations is, in either case, 
0.250 while with the correlations it may be as low as 0.008 or as high as 0.869. 


TABLE 2 
? 2 : a aa 
Values of B° for different values of the correlations and of o; : 02: 63. 


oi :03:03 
1:9:16 1:25:25 | 


250 | 0.221 
075 | 0.059 
549 0.543 
104 0.056 
429 | 0.402 
698 0.658 
019 | 0.020 
.765 .709 
016 | 0.038 | 
845 .793 | 
.010 042 
.596 551 
035 | 0.012 
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On the other hand when 8’ is large, in the case of zero correlation, say 6° = 0.790 
when o7 : 03:0; = 1:1: 25, the correlations, as might be expected, appear to 
have less effect, the values of 8” varying from 0.654 to 0.954. We may, therefore, 
conclude that if such correlations exist their effect on 6’, and therefore on the 
distribution of v, is certainly comparable with that of fairly large differences in 
the variances. 

We now examine 


P{v > w|B} = [ p(v) dv 
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for different values of v and 8. Writing p(v) in full from (33) and interchanging 
integration and summation we get 


>{v > w |B} = (rn — 1) — ris ta(n + pa +k) gute 


i v?(1 +»)? do. 


»0 
Changing the variable to x = (1 + v)” the integral part becomes 


we _ TQ + 1)r(2k + n—1) , _ : 
[Os a "(1 + a)” rQj + 2k +n) I,,(2k + n — 1, 27 + 1), 


in the notation usually aa [15]. Substitution gives 


24» os — Ry)" (27) 2 (n — 1) Jar gut _ 
P\v > v0 |B} = (1 — 6) . PIGPRT T,,(2k + n — 1, 27 + 1). 


Two sets of values of this expression were obtained, one for n = 3, and the 
other for n = 6, while 8° was given the v values 0.1, 0.2, 0.3, 0.4, and 0.5. The 


values of 2 were chosen so as to cover the 1, 5 and 10 per cent significance levels. 
Table 3 shows these results. 


TABLE 3 
P{v > vo/B} for certain values of vo and B 
(a) n=3 


Values of f? 


0.0 | | 0.2 0.3 


0. 0025 : 0. 004 
0.010 ‘ ‘ 0.014 
0.0225 .025 .027 0.030 
0.040 04: ‘ | 0.051 
0.090 ‘ . 0.106 
0.160 ‘ 17 0.175 


0.002 | 0.003 | 0. 0. 0. 006 
0.010 0.012 | 0. 0.020 
0.031 0. = .035 .043 | 0.047 
aa | 
0.1 


0.078 i 0.098 
0.168 | | 0.181 
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The 1, 5, and 10 per cent levels of significance for x) were obtained in both 
‘ases by graphical interpolation and the corresponding values of v then cal- 
culated. Table 4 shows clearly the changes in these significance levels. It 
must be remembered that values of 6° considerably in excess of 0.5 may easily 
arise. 


TABLE 4 
Changes in the levels of significance 6? = 0 and 0.5, n = 3 and 6. 


1 5 10 
per cent. per cent. per cent. 


Zo Vo Zo 


0 0.10 | 9.00 | 0.2 

0.07 13.0 | 0.1 
0 0.40 1.51 | 0.5 
0.5 | 0.382 | 2.1 | 0.5 


0 


This work shows quite clearly that the effect of any correlation between the 
yields, such as that introduced by variations of fertility within a block, or of any 
difference between the yield variance of different varieties tends to cause : 
significant deviation to be recognised when, in fact, none exists. When the 
number of varieties tested is three, the variation in the levels of significance 
may be quite large. 


10. Conclusion. The mathematical model appropriate to Randomized Block 
Experiments is examined and it is suggested that the use of the z-test, as ordi- 
narily applied, is theoretically justifiable only when the variations in fertility 
within each block are negligible. 

Correlations between the yields of the varieties, due to randomization in a 
limited set, are introduced when the differences in fertility within each block are 
allowed for. 

It is suggested that, as a first approximation, a multinormal population may 
be used for the yields from a given block, the variances and correlations being 
assumed equal from block to block, though the means, of course, differ. 

The simultaneous distribution.of the usual sums of squares is found in this case, 
and these sums of squares are shown to be independently distributed as the sums 
of squares of s — 1 and (n — 1) (s — 1) quantities from another multinormal 
population. 

It is shown that the usual distribution results apply when the variances and 
correlations of all the varieties are equal as well, of course, as when the variances 
are equal and the correlations zero. It is also shown that the same is true when 
the number of varieties is two, though the variances may differ. 

The distributions of the above sums of squares are obtained for all values of s, 
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the number of varieties, and the distribution of their ratio for s = 3. The 
method of obtaining the distribution of the ratio for s > 3 is also indicated. 

The relative importance of the deviations from the usual distribution produced 
by differences in the variances and differences in the correlations is examined 
when s = 3, and it is found when the variances are all equal that the latter can 
produce deviations comparable to one variance being seven times the other two. 

That the presence of the correlations or of non-equality of the variances causes 


a tendency for a significant difference to be found when none exists is clearly 
shown. 


In conclusion, I must express my gratitude to Prof. J. Neyman, now of the 
University of California, for suggesting this problem to me, to Dr. R. C. Geary 
and to Prof. 8. 8. Wilks for valuable suggestions. 
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ON THE DISTRIBUTION OF ERRORS IN N‘ TABULAR DIFFERENCES! 
By ARNOLD N. LowANn AND JACK LADERMAN 


In the construction of mathematical tables, a frequent method of checking 
the computed values of the tabulated function is to apply a differencing test. 
This test consists of computing the tabular differences of some suitable order n 
and comparing them with the theoretical values of the differences computed to a 
higher degree of accuracy by an independent method. Whenever the absolute 
deviation of a tabular difference A” from the corresponding theoretical dif- 
ference A” exceeds some predetermined upper bound, the entries giving rise 
to the difference in question are investigated. Thus, in the computation of the 
functions Sz(x), Ci(x), Ei(x) and Ei(—2z), it was found desirable to check the 
final manuscript by comparing the tabular second differences with the values 
of the second differences computed to a higher degree of accuracy by an inde- 
pendent method.” 

A study of the distribution of errors suggested the following problem: If we 
assume a rectangular distribution of the errors in the entries of a mathematical 
table, what is the distribution of errors in the n™ tabular differences? 

For the sake of mathematical simplicity, it will be convenient to idealize the 
problem as follows: Consider » + 1 randon numbers 2 , 2% , 2, --- Yn, drawn 
from any rectangular distribution. When arranged in a definite order, these 
n + 1 values give rise to an n"" difference A”. If these n + 1 numbers are rounded 
to k decimal places, the new approximate values Zp , Z , --- Zn, will give rise to 
another n™ difference A”. 

We shall investigate the distribution of the error A” — A”. 

The explicit expression for A” — A” is given by: 


A” — A” = CoE, — CYEna + CrEn2 — --- + (—1)"CrEo 
= Wo + wi + We + --- + w,(say) 


where E; = x; — Z; and CY are the binomial coefficients. 


1 The results reported in this paper were obtained in the course of work done by the 
Project for the Computation of Mathematical Tables. O. P. No. 765-97-3-10. Work 
Projects Administration, N. Y. C., operated under the sponsorship of Dr. Lyman J. Briggs, 
Director of the National Bureau of Standards. The authors wish to express their appre- 
ciation to the W. P. A. and to the Sponsor of this Project for permission to publish these 
results. 

2 The above functions were computed for x = 0(0.0001)2.0000 to 9 places of decimals and 
x = 2(0.001)10.000 to 9 decimals or significant figures. For the independent method of 
computation of the second differences, see article by A. N. Lowan in the Bulletin of the 
American Mathematical Society, August, 1939. 
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The distribution of any one of the E’s is 


~ if —310* < E < 310° 


f(E) dE = 
elsewhere. 


The subsequent developments are based on the fundamental theorem which 
states that the characteristic function of the distribution of the sum of any 
number of random variables is the product of the characteristic functions of the 
distributions of the individual variables.’ The characteristic function, g(Q), 
of f(x) is defined as follows: 


(1) g(t) = [ e''* f(x) dx. 


As is well known, the inversion of (1) is given by: 


(2) fiz) = 5 [ eo 9(t) dt, 


It can be easily seen that the distribution of w; is: 


10" . _10 °C? 


10 10 *C? 
cr dw;, if . 


én t= 
gq ="%5- 9 


0 elsewhere. 


f (wi) dw; = 


and its characteristic function is: 


sin 3(10-*C? 2) 


it) = 
a(t 4(10-*C?t) 


On the basis of the theorem, above mentioned, the characteristic function of the 
distribution of A" — A” = y (say) is: 


14) — Ty sin 310 “C7 4) 
(3) e® = I] sae"e 


The desired frequency function, F(y) is given by the inversion of 
G(t) = | e''” F(y) dy. 


From (2) and (3), we get: 


1 of” seven 400°C? 0 
ie « + [ wo Te 
Y= FL UE aoer 


3 See, for instance, Harald Cramér, Random Variables and Probability Distributions, 
Cambridge Tracts in Mathematics and Mathematical Physics No. 36 (1937) p. 36. 
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which may be written as: 
1Q*(m+) a n ’ 
(4) Fy) = —.— | cos (2 ty)- [J sin ao *c; 0-5. 
Cc? — 00 i=0 ; 
vit 


The problem now reduces to the evaluation of the integral in (4). In the 
evaluation of this integral, it is convenient to consider even and odd values of n 
separately. 

Case 1. When n is even 


Il sin or" t) — Poss _— x + F aut a ee + (—1)" Pyynsil 


where P,4:-, denotes the sum of the sines of n + 1 — r of the angles taken 
positively and the remaining r taken negatively, the negative angles being taken 
n 
in every combination.’ Thus cos (2 ty) [J] sin (10-“C?t) can be expressed as 
i=0 
the sum of products of a sine by a cosine. By employing the identity sin A 
cos B = 3{sin (A + B) + sin (A — B)}, each term can be written as the sum 
of two sines. Hence the integral under consideration can be written as the 
sum of integrals of the form 
” sin at 
j =F. 


- é {ntl 


Integrating by parts n times in succession, we obtain: 


(5) i sin at ,, _ (—) a sin at 


{n+ n! o € 


? Gi P for a>0O 
But | = ” dt = 7 
Lo ft \-m for a<0. 


for a>O 


n! 


| (=1)"a"x 


ili “sinat., _ : 

(6) Therefore [. err at = | (—pq"s 

ae for a <0. 

n! 

By use of (6) the integral in (4) can be readily evaluated. 
Case 2. When nisodd. ~ 


(— ie 


in 10°C? t = *— 
I] sin Crt On 


[Qn4i iia O.. + a = 289 + (~1)°™ . tQanrol 


where Q,,4:-, denotes the sum of cosines of the sum of n + 1 — r of the angles 
taken positively and the remaining r taken negatively, the negative angles being 


4 See E. W. Hobson, Plane Trigonometry, Seventh Edition (1928) pp. 50-51. 
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taken in every combination. As in Case I, the integral in (4) can be expressed 
as the sum of integrals of the form: 


[ cos at 


es pnt 





Integrating by parts, we obtain: 
r* cos at a f° sinat 
me ea~ if BHa 
L. ee NJ-o t" 


The second member of this equation has been treated in Case I. It follows 


that: 
i 4(n+1) on 
. (- i mm for a>0O 
cosat,, _ ; 
(7) :. prt dt = (.. 57" as 
a” tains for a< 0. 


By means of the integrals, (6) and (7), F(y) can be obtained for any n. The 
results for n = 1, 2, and 3 are given below: 


n= li 










| 10%y+10 for —10* <y<0 
F(y) ={ —10%y +10" for O<y< 10" 
0 





elsewhere 





3k 
— y? + 10%y +10 for —2.10% <y < —107 


1 o* 


c, v wall nt 
7 y + a for -10* <y< 10 













y’ — 10%y + 10° for 10* <y <2-10~* 


elsewhere 









-., of .. 6 32-10* 
54 4 ° a +—9-¥ FZ 
for —4.10* <y < —3-10“% 
Fly) =) 19 10 . 10% © 5.10° ai - 
a y “—— i 0 y+ oF for —3-10 "<y < —2-10 
10” 10° 


—~ nd f —3.10° < ay < ~10" 
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n = 3.—cont. 
10“ 10" » , 8-10° : 
ee for -10*<y <0 











a, £,.,37v _ 
a ain ee ital f <y< 10" 
9 y 9 y 7 or 0<y< 10 
2k k 
-* y+ = for 10* <y <2.10° 
FY) m7 4k 3k 2k k 
(cont.) re. ww... 5-10 - vi 
ll” TOA IF cs oe 2.10" <y <3.1 
54 Y 9 Yt oyt a7 for <y < 3-10 
| 10" 5 , 2-10% . 8.10% , 32-10! 
54 Y 9 4 g 4 27 
for 3-10" <y < 4-10" 
| QO elsewhere. 





\ 





In general, F(y) is an even continuous function, vanishing for | y | > 2”* 10“ 
and defined by different n'” degree polynomials in different intervals. 

The above frequency functions were derived on the assumption that the 2’s 
are random numbers drawn from a rectangular distribution. However, the 
results may be applied to the entries of a mathematical table provided the 
rounding errors are horizontally distributed and the difference under considera- 
tion is of such an order that the digits in the decimal place corresponding to the 
last place given in the table are also horizontally distributed. These conditions 
are frequently satisfied. Since data on the errors in the second differences of a 
table of Ci(x) = / = * dx given to 9 decimals was available, a study was 
made of a sample of 1000 of these errors. The theoretical and observed fre- 
quencies for this sample are given in the following table: 


















=—2 {| —t.6|—1.0; —.6 | 0 oO 1.0 1.5 | 


Error to to | to to | to | to to | to 





—1.5|—-1.0| —.5 | 0 5 1.0/1.5] 2 


Theoretical Frequency 10.4 ‘72.9177 .1239.6239.61177.1 72.9) 10. 4'1000.0 





Observed Frequency 9 68 161 (272 243 174 | 63 10 ‘1000 





By applying Pearson’s x’-test, it is found that the observed frequencies show no 
significant deviations from the theoretical frequencies. 
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ON TESTING THE HYPOTHESIS THAT TWO SAMPLES HAVE BEEN 
DRAWN FROM A COMMON NORMAL POPULATION 


By B. A. Lencyet’ 


1. Introduction. This paper is devoted to the problem of testing the hypothe- 
sis that two samples of 2, 3 and 4 variables, and of equal size, have been drawn 
from a common unspecified normal population. It is, in a certain sense, a 
continuation of J. W. Fertig’s papers [1, 2] which were devoted to the problem 
of testing the hypothesis that one or more samples of n variables have been 
drawn from a completely or partially specified normal population. 

For the sake of application to biological research, it is important to have 
means of determining whether two samples may have come from a common 
population even if this population is unknown. Moreover, it is often imperative 
to test two samples with respect to all their variables simultaneously. Much 
valuable information may be lost if the variables are tested individually. One 
has to consider not only the fact that two samples which differ almost signifi- 
cantly from each other in each variable separately might be significantly different 
if the probabilities would be combined, but one has to take account of the 
possible correlations between the variables which are completely disregarded if 
the tests are applied to each variable separately. It is not difficult to imagine 
two samples of two variables with identical means and variances and signifi- 
cantly different correlation coefficients. 

J. Neyman and E. 8. Pearson [3] have investigated the problem of testing sta- 
tistical hypotheses in general. They have developed the method of likelihood 
ratios. It is beyond the scope of the present paper to give an account of this 
theory; we have to restrict ourselves to statements concerning the fundamental 
concepts we are going to apply to our specific problem. 

A sample with one variable and of size N can be regarded as a point in an 
N-dimensional space. The acceptance or rejection of a hypothesis concerning 
this sample will depend on whether or not the point representing the sample is 
contained in certain critical regions determined by the hypothesis and by the 
statistical criterion that is to be applied. The choice of the critical regions is of 
fundamental importance; its implications have been thoroughly discussed by 
Neyman and Pearson. These authors found a useful criterion for testing the 
hypothesis that a sample was drawn from a specified member of an admissible 
set of populations by introducing the ratio of the likelihood that the sample 
was drawn from the specified population to the maximum value of the likelihood 
for all populations in the admissible set (Cf. §2). This ratio \ can vary between 


1 From the Research Service of the Worcester State Hospital, Worcester, Massachusetts. 
365 
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Qand1. The association between values of \ and the credibility of the hypothe- 
sis in question is such that the greater the value of \ the greater the degree of 
tenability of the hypothesis. } = constant defines a surface in the sample 
space. These surfaces are the contours of the critical regions associated with 
the acceptance or rejection of the hypothesis. A hypothesis is rejected as 
untenable if \ is so small that 


r 
| P(A) dd < a, 
0 


where @ is some value arbitrarily small, say .01 or .05 and P(A) is the distribu- 
tion of \ if the hypothesis is true. 

This method of testing hypotheses is evidently not restricted to one sample 
with one variable, nor is it restricted to stmple hypotheses. A simple hypothesis 
is one which is associated with one completely specified population. A com- 
posite hypothesis is one which is associated with a subset of the admissible 
populations. For example, the hypothesis that a sample with n variables has 
been drawn from a normal population with means a, , a2, --- a, whatever may 
be the variances and correlation coefficients is a composite hypothesis. Such is 
the hypothesis that two or more samples have come from a common but un- 
specified population. 

The problem of several samples with one variable was discussed by Neyman 
and Pearson [4, 5]; the problem of several samples with two variables by Pearson 
and Wilks [6]. In another paper Wilks [7] derived formulas for \ and the 
moments of P(A) for the most general case of k samples of n variables. For the 
sake of practical applications it is necessary to have tables for the limits of 
significance of \. Such tables have been prepared for samples with one variable 
by Neyman and Pearson and for completely or at least partially specified popu- 
lations and more than one variable by Fertig [1, 2]. The present paper con- 
tains tables for the case of 2, 3 and 4 variables and a common unspecified normal 
population. Since the case of two variables has been theoretically solved by 
Pearson and Wilks we shall have to compare our results with those of the above 
authors who derived the distribution of P(A) but did not compute tables. 

Our procedure is the following: We start with the moments of P(A) as given 
by Wilks and approximate the distribution of \"/* by a suitable function. Then 
we compute the limits of significance for this approximating function. This 
procedure was originally suggested by Neyman and Pearson and was applied 
with some modifications by Fertig. 

§2 contains the definition of the likelihood ratio \; §3 deals with the moments 
of its distribution for the case of a common unspecified population. In §4 we 
introduce the approximating distribution y = Cz” ‘(1 — x)*" and discuss the 
determination of the parameters p and q. In §5 we give an independent deriva- 
tion of the formula obtained by Pearson and Wilks for P(A) for the case of two 
samples with two variables and compare our approximation with the exact 
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formula. §6 deals with the determination of the limits of significance and con- 
tains the tables. §7 is devoted to an example. 


2. Definition of \. Let C, denote the probability of obtaining a given sample 
from a population x. C will depend on the parameters of the population and 
the data of the sample. Let © be the set of all admissible populations and w a 
subset of 2 which corresponds to a certain hypothesis that is to be tested. 
Intuitively one would consider a hypothesis tenable or plausible if it gives a high 
probability density for the given sample if compared with other possible hypoth- 
eses. Following this reasoning Neyman and Pearson defined the likelihood 
of a hypothesis as the ratio of Max C, to Max C,. In the special case which 


Tew reQ 

we propose to investigate, the populations are assumed to be normal. We wish 
to test the hypothesis that two given samples have come from a common unspeci- 
fied population. Hence X, the likelihood of this hypothesis, is the maximum 
likelihood that the samples have come from a common normal population 
divided by the maximum likelihood that the samples have come from any ,two 
normal populations. 

The value of \ can be expressed by the variates of the samples by means of the 
following formula [Cf. [7] p. 489] 


_ A 3N1 So 3Ne 
” = [ey [al 


where S, and S2 are the generalized variances’ of the samples and Sp is the 
generalized variance of the sample obtained by pooling the two given samples. 
N, is the size of the first sample, Ne the size of the second. In case of equal 
samples to which we shall restrict ourselves N; = Ne = N; thus 


(2) UN po S} S} 


So 


3. The Moments of the \ Distribution. The distribution of \ depends on the 
number of variables, the number and the size of the samples and on the kind of 
hypothesis that is to be tested; e.g. that the samples have come from a common 
unspecified population. This distribution has been evaluated for the case of 
equal samples of one and two variables and our hypothesis concerning a common 
unspecified population. The general form of this distribution is still unknown 
and even the known formula for two variables is not very suitable for computa- 
tion. Therefore we shall follow the procedure introduced by Neyman and 
Pearson [4] and we shall use the known moments of the unknown distribution 


2 The generalized variance of a sample is a determinant, the elements of which are the 
variances and covariances. Thus, for two variables x and y the generalized variance 
S = §282(1 — r?); where S? and Sj denote the variances of x and y respectively, r the correla- 
tion coefficient. 
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function P(A) in order to construct an approximation to P(A). For two equal 
samples of n variables the moments of P(A) about the origin are [Cf. [7] p. 490] 


a +h) - ‘) ° (= _ ‘) 
on r| — > ———— 7 - 5 a 
(3) M, _ = seas 


E4 a a 2N(1 + h) = 
for h = 1, 2,3,---. 


Equation (2) readily suggests that we should compute or approximate the 

distribution of \"* rather than that of >. Let we denote the h-th moment 

of P(A”) then u, = My;y follows immediately from the definition of M, = 
1 

/ \" P(A)dX. Hence in order to obtain the y’s we have to replace Nh by 
0 

h in (3). 


Pr ae 

rT . 2 , 2 

(4) ae II cS | -/2N + 2h — i\(" 
(7 r( +3 


This expression can be much simplified for all given values of h and n. How- 
ever, there is no need for such simplification, because one has to compute the 
first moments only. All higher moments can be expressed by means of the first 
moments for various N’s. The dependence of u, on N is evident from (4), we 
shall indicate it by writing u,(N). The ratio of two subsequent moments is 


( pitas od 2 p+» = ) 

” una) — " - 2 _ —--- 9 ee 

macy) = * 4 “quae (2 ERED =A) 
: 2 

= w(N + h). 


Equation (5) contains an important relation of the moments. In fact from (5) 
follows: 


/uo(N) = w(N)u(N + 1), 
6) us(N) = w(N)u(N + 1)u(N + 2), 


lL ua(N) = p(N)u(N +1) --- p(N +h — 0), 


where the 1’s from y;(N)’s have been omitted. This last group of equations 
holds for any number of variables. Thus we have to compute u(N) for each N, 
then multiplication gives the higher moments. 
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_ we. 
(N — 1)(N — 3) 


|. (3) —_—_ 
r( =a (N — 4)(N -1)(N — §)° 
2 
(N — 2)?(N — 4)’ 
(N — 1)(N — 3)(N — 2)(N — 3) 


u(N) = 


u(N) = 


For u(N) 


4. Approximation of the distribution of \’*. Following the procedure of 


Neyman and Pearson we shall use the moments computed in the previous 
section for the fitting of a Pearson frequency curve to the unknown distribution 
P(A”). Since 0 < \ S 1 it is natural to fit a frequency curve of the following 
type 


(7) y = Cx? "(1 — 2)", 
ioe _ T+) 
Bip,q) = V(p)T() 
The first two moments are sufficient to determine the two parameters p and 9. 
The moments of the distribution (7) are readily computed: 


where 


PP 

p+q’ 

: rs .. § pti 
‘ptatl ptaptatl 


Vi 


(8) 


In general: 


] 
(9) Mer Ph 
Vr pt+qth 
Equation (9) corresponds to equation (5) since one can write v = v(p, q), then 
(9) becomes 


(10) v(P, 9) _ v(p + h, 9). 

vil DP, q) 
At first sight the similarity of equations (5) and (10) would suggest that one 
should choose p and q so that v(p + h, gq) = u(N +h), for h = 0, 1, 2,3, ---. 
However, this cannot be done because the equations which express the equality 
of the first two moments: 


(11) v(p, gq) = w(N) 
(12) vp + 1,q) = w(N + 1) 


3 The case n = 1 is omitted here since it has been treated by Neyman and Pearson [4]. 
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determine p and q completely. The quantities v(p + h, g) can be only approzi- 
mately equal to u(N + h) for h > 1. The goodness of fit may be tested by 
comparing the third and fourth moments. 

The advantage of equations (5) and (10) is that once p and q have been 
computed one does not have to compare 


v3 = vev(p + 3, q) 
with 


us = weu(N + 3), 


but since vo = we one can compare v(p + 3, g) with u(N + 3). Similarly the 
comparison of the fourth moments can be replaced by comparing »(p + 4, q) 
with u(N + 4). It has to be remembered that once the sequence of u(N)’s has 
been computed for all N’s, each of its terms can be used four times in the deter- 
mination and the checking of p’s and q’s. 

The general procedure for the determination of p and gq is to compute the 
u(N)’s first and then solve the equations (11) and (12): i.e. 


eae 


p+il1 
14 I... = gl + 1). 
(14) p+qr] ul ) 


The solution of these equations is: 


» = uN) 2a HA ED 
(15) p= uN) oo +1) — nN)’ 


] 
aad a= a - |p 


As N increases u(N) approaches 1 from below; u(N + 1) — u(N) will be very 
small. E.g., for n = 2 as N varies from 30 to 50 u(N) increases from .9164 
to .9499. It is easily seen that small errors in » may produce much larger errors 
in pand q. For n = 4 it was necessary to compute yu to nine decimal places 
to get p and q to three decimal places accurately. For n = 2 equations (13) and 
(14) become 


p (N — 2)? 
17 ies — en 
” p+a W-)W-d 
(18) _ Pp +1 a (N — 1)* 
pt+qt1 N(N+ 3%) 
These can be solved explicitly 


on _ 3(N - 1) 
we p= W—2)[1— ar ayes) 
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4.5 

=2. a... 

(20) q 5+ ENT ON I 

The last two equations enable us to compute p and q directly and thus avoid the 

more laborious computation by means of the u(N)’s. For n = 3 and 4, however, 

such a short cut was not found. The computation of y’s for n = 4 was facili- 
tated by the following relation: 








(N — 4)’ 

(21) (N) = ou(N) ——— 
(N — 2)(N — $) 

where the suffices denote the number of variables in the problem to which the 

u’s refer. Thus the computed values of 2u(N) were again used. Eight-place 

logarithms were used in the computation of 3u(N) from the formula at the 

end of §3. 














5. The distribution of \"/" for two variables. For two variables it is possible 
to evaluate the distribution of \”” or some other suitable power of \ directly 
from the moments. Pearson and Wilks (Cf. [6] pp. 364-368) derived the 
distribution of \”” for this case. Their method was adapted to the treatment of 
more general problems than ours. It is possible to derive the distribution of 
\" in our special case more directly: 


For n = 2 the moments of \”” are: 


r(* tan!)p(24 — 2 
an : rN — 3)P(N — 1) 





r( = ‘ i 2) (PTW +h- PIN +h - 1)’ 


2 2 


hw EOE ..+. 









Applying the following transformation formula‘ 





(23) T(z) (z + 3) = V™ r(2e) 


to 


a = 3(N +h — 2), 





z = 3(N — 2), 





z3=N-—1 and 4=N+h-1 


(22) becomes 


_ onfT(N+h—2)]? T(2N — 2) 
(24) m= 2] T(N — 2) Vay: 









Thus us is of the form F(N + h)/F(N) with 
7 — 92N r(N ies 2)’ 
F(N) = 2 TQN — 2) 





*Cf. Whittaker and Watson. Modern Analysis, 4th ed., p. 240. 
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Our problem is to find a function P(t) such that 
1 
(25) m= [ ePOdt, 
0 


forh = 0,1, 2,---. 


This problem is solved if we can find a function of N and 1, say, p(N, t) such 
that 


1 
(26) F(N +h) = | t' D(N, t) dt 
0 


forh = 0,1, 2,---. 


N and h enter the left side of equation (26) symmetrically. The same must be 
true for the right side. Hence p(N, ¢) must have the form ¢*f(t) where f(é) is 
independent of N. If then (26) is satisfied for all N and h = 0, it is also satisfied 
for all N and all h. 


Let us now examine F(N). Applying again the transformation (23) we can 
bring it to the form: 


; “ay. . ’ as 3 
FIN) = 2 TW : vr ge 0 2)P$) 
r(N — 3)(N — 2) (N — 2)r(N — $) 
(27) of 
, 
N - 9 B(N 2, o/)- 
Now B(N — 2, $) can be represented as an integral of the desired type 
1 
(28) B(N — 2, 3) = | t’*(1 — t) dt. 
0 
We set p(N, t) = 2°’ “g(t) and seek to determine g(t) so that (26) will be satis- 
fied for all N and for h = 0: i.e., 
ee 2* r 3 4 : N-3 
(29) F(N) = = B(N — 2,3) = 2 t’ “g(t) dt. 
N —-2 ‘Jo 

An integration by parts with g(1) = 0 gives 
; 91 91 1 - 
30 B(N — 2,3) = — [ eve 
(30) Nowe BO 2,3 vNuo2/,! q’(t) dt 
This equation evidently holds for all N if and only if 
(31) —tg'(t) = V1 —t. 
This differential equation is readily solved by the substitution of y = 1 — t. 


In fact it becomes 


dg(y) _ 2y" = 2 re 
(32) stu e PEt t + 
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Hence 
(33) g = log as 2y = 2| tog 1+ v1 EE ve Wi - e|, 
1 V/t 
The complete solution for the distribution function is 
_ T(QN-—2) wes 1+vi-# . 
8) PO = aie aR Le Wt - viqela 


with ¢ = d"” in accordance with the formula of Pearson and Wilks.’ Integration 
by parts gives 


T(2N — 2) 


1/N = 7: \ ] 
PA < t) = 22h-5T(N — 1)T(N — 2)" 


(35) ( l+~7 
N—2 1 — 2 9 N-j 
* oe — —_ Vil-e@t+ sf x v/1 — vay}. 


One can use this last equation to determine the limits of significance. How- 
ever, this was not done when the tables of this paper were computed. The 
approximation of the distribution function by the function described in equation 
(7) was deemed sufficient and the use of the tables of the incomplete beta function 
greatly facilitated the computation. 

In concluding this section we wish to demonstrate the goodness of approxima- 
tion of the exact distribution function by a function of the type Ct”? ‘(1 — t)*" 
‘ with p and q given by equations (19) and (20). 

For small values of ¢ the shape of the curve is determined by the exponent of 
t, which is exactly N — 3 for the distribution function and nearly N — 3 — 3 
for the approximating function. For large ¢t; i.e., small 1 — ¢, the exponent of 
(1 — 2) is the determining factor. By (32) we have 

g(/1 —t) = 2 * - 4 (1 -4 4 |, 
or approximately $(1 — t)’. For the approximating curve g — 1 = 3 + O(1/N’) 
which is even better agreement. It is easily seen that the goodness of approxi- 
mation increases with N. 


6. Determination of the Levels of Significance. The final task was to com- 
pute the values of z which satisfy the equations: 


, ie 1 - [ p-1 -_ q~i _ 
IAp,q) = B® se "(1 — t)* dt =a, 


with a = .01 and a = .05. This was done by interpolation in the Tables of the 
Incomplete Beta Function [8]. In these tables the argument z increases by steps 
of .01. A value 2 was determined by inspection, so that J,,(p, gq) < a@ but 
I,,(p, gq) > a where 2; = 2 + .01. The values of J,,(p, q) and J,,(p, q) were 


> Cf. [6] p. 368 Equation 60. (2? = 2). 
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determined by interpolation with respect to p and g, using the two-dimensional 
Everett formula, neglecting fourth and higher differences. 2x was then deter- 
mined by linear interpolation. It is worth while to mention that the terms of 
second order in Everett’s formula decreased quite rapidly as N increased. Once 
this was noticed some labor has been saved by not computing the terms of 
second order for values of N between 30 and 50 but by estimating the second 
order terms from those obtained for N = 30, 40 and 50. 


1/N 


Levels of Significance of 


Sample 2 | : 3 s 
Size Variables Variables Variables 


N 1% 


10 . ‘ | .238 
1] ‘ ‘ . 282 
12 | ‘ .323 
13 ‘ ‘ .360 
14 ‘ ‘ .393 
15 ‘ ‘ .423 
16 [4 ‘ | ,451 

.476 

. 500 

.521 
04] 
.576 
.606 
.632 
.655 
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7. An Example. The problem chosen to illustrate the use of the tables is 
taken from a study on insulin-treated schizophrenic patients of the Worcester 
State Hospital. It was attempted to differentiate between those patients who 
recovered after treatment and those who did not recover. Blood constituents 
and blood pressure were determined among other variables. 

The variables in this example are designated as x = blood phosphorus, y = 
cholesterol in mg./100 ec., 2 = blood pressurein mm. Hg. The statistics for the 
10 “‘recovered”’ patients are: 








Si= 2.222 S? = 376.50 S? = 51.97 
reSzSy = —1.121 3528S, = —8.217 T3SyS: = 12.51 

For “not-recovered” 10 patients 
S? = 3.120 Si = 816.19 S? = 96.32 





T2525, = 26.23 T3528: = 2.92 r3S,S- = 65.78 
For the total group of 20 


S2 = 3.034 S? = 609.02 S? = 83.09 
r2S2S,y = 194] ri352S2 = — .845 138,82 = 15.99 
























These values give for the sample variances 17,462; 168,628; and 143,904, respec- 
tively. 
Hence 
ww _ 17,402 X 168,628 _ 
= 143,904 O77. 


The 5% limit of significance is .328, hence the two groups do not differ signifi- 
cantly from each other. 
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NOTES 


This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


THE DISTRIBUTION OF “STUDENT’S” RATIO FOR SAMPLES OF 
TWO ITEMS DRAWN FROM NON-NORMAL UNIVERSES 


By Jack LADERMAN 


The fundamental assumption in the derivation of “Student’s” distribution’ 


r (3) 


A i 
n-—1 t 
Vain 1) 2 Yat 5) 


is that the universe sampled is normal. When the universe sampled is non- 
normal and n is small, the distribution of ¢ differs considerably from ‘‘student’s” 
distribution. In 1929, Rider’ derived the distribution of ¢ for samples of two 
drawn from the rectangular distribution 


(idx forO<2<1 
f(x) dx = 
| 0 elsewhere. 

In this paper, the formal expression for the distribution of ¢ will be derived 
for samples of two drawn from any population having a continuous frequency 
function. A geometrical method similar to that employed by Rider will be used. 

Let the universe sampled have the frequency function, f(x), with zero mean, 
and let f(z) be greater than zero from x = a to x = b and equal to zero else- 
where. Suppose the two observations are x, and 2. 


i Yi + Xe 
Then {= 


2 


and Ss — Vné —_ = — __ V2e See 5 
/S(a— #)? V(t — &)? + (a2 — &) 


V n—1 


1‘ ““Student’’, ““The Probable Error of a Mean’’ Biometrika, Vol. VI (1908), pp. 1-25. 

2 Paul R. Rider, “‘On the Distribution of the Ratio of the Mean to Standard Devia- 
tion in Small Samples from Non-Normal Universes’’, Biometrika, Vol. XXI (1929), pp. 
124-143. 
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The sample (2 , x2) can be represented by a point in a square of side b-a, 
as point P in Figure 1. 


The coordinates of F are (%, %) 
OF = —-V2i 
FP = V(a — #)? + (a2 — 2)? 
OF 


therefore — = cot @. 

1erefol FP 
Similarly for a point lying below AB, the value of ¢ is — cot 6. Hence all 
points on OS and its image OR have the same value of ¢. 


3r 
L — 
et 6 4 


Then tana = + and the equation of OS is 
m='tl, 

7 §=<F" 

The probability of getting a sample point in the element of area dz; dz is 
f(x1)f(x2) dx, dxz. Therefore the probability of getting a value of ¢ less than 
the value represented by a point on OS is given by 


(1) 2 / I “ flar)f(a,) dng di. 


By differentiating (1), we get the frequency function 


4 F t+ 
g(t) adda aay | ngtays (t+ n) dx. 
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However, this expression is valid only when t < cot ¢ where ¢ is the angle 
between OB and OC in Figure 1. From Figure 1 we notice that 


a® af 
yg 4 + cot ( *) 


b+a 


therefore cot ¢ = peg 
—a 


b+ a 


== @ 


Hence the above expression, g(t), is valid for t < 


b+a - aa 
When ¢ > es, the probability of obtaining a value of ¢ greater than the 


value represented by a point on OS or its image OR, as in Figure 2, is given by 


af [seeped anar, 


t 


and the distribution function is 


(2) 1—2- . / : fladf(as) dar date. 


——z2 


t+1 


After differentiating (2), we obtain the frequency function 


4 . t—1 
g(t) — (t + 7 f tof (xe) f (3 r) dxo. 


Thus, the frequency function of ¢ for samples of two ean be obtained from the 
expressions: 








the 
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4 ? t 1 : 
a opto) ( + r) dx fort < 7s 
(t — 1)? Jo Fant b—a 
(3) g(t) = 


| 4 a t—1 , ~bt+a 
la si f afta (( 4 c) dz fort > 5 


These expressions may also be used when a, or b, or both are infinite. However 
, ’ 


— .,b+a. ‘ : . ; - ‘ 
the join point _= 78 then indeterminate, but by consideration of Figure 1, it 


can easily be seen that the join points are as follows: 












a=-—o« 
b finite pean 


a finite 


b= + 





—_ i. 


= +a 















The expressions given by (3) have been verified by obtaining the distribution 
of t for samples of two drawn from the normal distribution and also from the 
rectangular distribution. The explicit distributions were found very easily from 
(3) by performing the integrations. 














_ 
For instance when f(z) = —-¢ 
o WV 2n 
we get q(t) — ae —o <ti< + a) 
which agrees with Student’s distribution for n = 2. 
is ~§s$23% 
Aah whee f(z) = iC elsewhere 
as Mai fort <0 
| 2(¢ — 1)? r 
we get g(t) = 
}__1 = fort > 0 
ae OA 


which agrees with the distribution found by Rider as corrected by Perlo.° 


NEw YORK 





’ Victor Perlo, “‘On the Distribution of Student’s Ratio for Samples of Three Drawn 
from a Rectangular Distribution,’’ Biometrika, Vol. XXV (1933) pp. 203-204. 
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ON SOME INFINITE SERIES INTRODUCED BY TSCHUPROW 
By J. B. D. DERKSEN 


In his fundamental work on the principles of the theory of correlation Tschu- 
prow introduces some infinite series, leaving certain questions regarding their 
convergence or divergence unsolved.’ 

As will be shown in the following note, these series are what may be termed 
randomly divergent,’ that is series involving random variables which may take 
on values which will make the series divergent. This result is of importance: 
e.g. the well-known formula for the standard error of a correlation coefficient 


1—?r\. ‘ ‘ ; , : 
( VN is the first term of an infinite series for which the question of con- 


vergence has not been carefully considered. 

Tschuprow finds himself confronted by infinite series, when dealing with the 
mathematical expectations of quotients as e.g. correlation coefficients or sums 
of quotients as e.g. the mean square contingency. Let us consider a two- 
dimensional discontinuous universe, where the variables are x and y. Let p;; 
be the probability of the occurrence of the pair of values x;, y;. The proba- 

l 
bility that x assumes the value 2; equals > pis = pi) @ =1,--- ks jf =1,--- OD. 
j=l 
When taking a sample of N pairs of observations (x, y) the relative frequency 
of x; will be p;, , and that of the pair (z;, y;) will be p;;. In accordance with 
Tschuprow we put p;; = pi; + dpi; and pi; = pi + dpi, where dp;; and dp; 
are random variables. 
As one of the simplest cases we consider the mathematical expectation of 


(2) = ¢ aps) (1 + _ . 
Pi! Pi| Pii Pi| 


Now Tschuprow develops the last factor in an infinite binomial series, getting 


"\2 2 ; ot : 2 
’ Di\ Pil \ Pij Pij \ Pi} Di| } 
2 2 2 
- Pai |! 42 dpi _ 9 dpi 4 apis ik dp; -dpi; 43 Ps a 

Pil Pi Pi| Pii Di\ * Pii Pil 

He has given general formulae (Biometrika, vol. XII, p. 194 (1919)) from which 
the mathematical expectations of the terms of this infinite series may imme- 
diately be found. We get an infinite series containing ascending powers of N 


1A. A. Tschuprow, Grundbegriffe und Grundprobleme der Korrelationstheorie, Leipzig- 
Berlin, 1925, p. 85-97. An English translation was prepared by M. Kantorowitsch (Prin- 
ciples of the Theory of Correlation, W. Hodge & Co., 1939). 


2 


2Cf. my Inleiding tot de correlalierekening, Delft, 1935, p. 88-90. 
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in the denominator. Finally the convergence or divergence of this series has to 
be investigated, the problem left unsolved by Tschuprow. 

i s ; dpi, \~ . 
The series expansion of (1 + Pa) however, diverges for values dp; such 
i| 


| dpi, | 
that apa 


> 1 and converges only if 
i| 

dpi 
(2) Be) <8. 

Pi\ | 

This result is not affected by the procedure of the determination of mathe- 

matical expectations. For if f(p;; , p:;) is the probability distribution of (pi, , 
p;;), then we have to multiply (1) by this function and to sum for all possible 
values of p;), p:;. As the expressions 


2 d —2 
fps, pi) Pa + “Ps 
Pil Pii 

are always positive, the infinite series, which results from replacing the terms of 
(1) by their mathematical expectations, will also be divergent. 
The same argument is true, when we consider for instance the mathematical 
expectation of the Pearson-Bravais correlation coefficient. Denoting by un , 
woo, Moz the population values of the product moment and the second order 
moment of x and y, and by ui, “20, Moe the values observed in a sample, the 
mathematical expectation of the correlation coefficient may be found from 


E(r) = E| ui | a e) —_* +dun | 
(4120 to2)* (1120 + dus)? (wor + doe)? 


dua 


1+ 
Mul _Hil 


- i E : —— 2 
(4420 Moo)? (1 + du ) (: ‘ ‘mn 
Moo Mo2 


where duy; , due , and duo are random variables. Tschuprow expands the de- 
nominator in binomial series. However if dus and duo take on values such that 
dus) 1 oy | Mu 

20 Mo2 
culties arise in all other cases, where Tschuprow makes use of binomial ex- 
pansions. 

It should also be remarked that the well-known formulae, given by the Bio- 
metric School for the standard errors of regression and correlation coefficients 
are equal to the mathematical expectation of the first terms of infinite series, 
which, as explained above, are divergent for certain values of the random 
variables. Therefore the question arises as to what effect the divergence for 
some of the values of the random variables has on those formulae. 


> 1, these series will again be divergent. Analogous diffi- 
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This question can be cleared up by the introduction of Slutsky’s conditionally 
aleatory variables.* These are defined as follows. Suppose that an aleatory 
variable z can assume the values 2 , 22, --- 2,, With probabilities p; , po, --- Dr. 
Now we put some of these probabilities equal to 0, dividing the remaining ones 
by 1 — Q, if Q represents the total of all the reduced probabilities. The variable 
z then becomes the so-called conditionally aleatory variable z’. Moreover we 
assume that z converges stochastically to some limit. Then Slutsky has shown, 
that if Q converges to 0, 2’ will converge to the same stochastical limit as z. 
Moreover the ratio of corresponding moments of the distributions of z and 2z’ 
will tend to unity. 

Now let us consider for example 


‘a = “ tae ~ 

; Di piy + dpi . 
Omitting the values for which | dpi | > pi, we get a conditionally aleatory 
variable 2’ instead of z. However, according to the theorem mentioned before 
2’ and z will converge to the same stochastical limit, since the probability that 
| dpi, | > pi converges stochastically to zero as the number of observations 
increases indefinitely. 

In the same way we consider 


, = it an wu + dun ; 
(120 w100)? (1429 + dt29)* (woe + duo)? 


and omit those values for which | dus | > wo and | duo | > moe . 

If now we consider the binomial expansions for the conditionally aleatory 
variables and determine the mathematical expectations of the terms, these new 
series will converge. All terms in these convergent series will be smaller than 
the corresponding terms in T’schuprow’s series, because we have omitted the 
larger values of the dp’s and the dy’s. However if the number of observations 
increases indefinitely the ratios between corresponding terms tend to unity, 
because the probabilities, that e.g. | duso | > ue or | duos | > woe converge to zero. 

Let us now turn again to the infinite series given by Tschuprow (loc. cit. p. 90) 
for the square of the standard error of a correlation coefficient. 

‘ 2 Men — Pied? — 4 ts ts 
(3) a, = EQ ia oe + Nn? +- Na + 

Here é,, &2, &---- represent rather lengthy expressions, for which we may 
refer to Tschuprow’s book (loc. cit. p. 88-90). As we have seen before, this 
series is randomly divergent. However, by introducing a conditionally aleatory 
variable in the way described above, expanding it into an infinite series and 


3. Slutsky, ‘““Uber stochastische Asymptoten und Grenzwerte,’’ Metron 1925, Vol. V. 
No. 3, p. 79. Also my Inleiding tot de correlatierekening, Ch. V. and VI. 
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determining the mathematical expectations of its terms, we get a convergent 
series, say: 









ts 


(4) de =3 Ly i +3 _ =o 
From Slutsky’s theorem, mentioned before, it follows that if N increases the 
ratio of «? and a, will tend to unity. Moreover, if we take N sufficiently large, 
it will always be possible to fulfill the following inequalities: 











, 
pipe & =1,2...8) 
| ty 
where «& (kK = 1, 2,--- m) and n are arbitrary. Therefore, when n and WN are 


sufficiently large the ratio between the first m terms of the infinite series (3) 
and the true value of o7 will differ from 1 by an arbitrary small number. Though 
the series (3) is divergent for any N, however large, the first n terms of this 
series will give an approximation of «7 by taking N sufficiently large. 

In this paper we have shown that the procedures which have been followed 
by the Biometric School and Tschuprow to establish formulas for the standard 
errors of correlation and regression coefficients and in analogous problems can 
be made rigorous by the use of conditionally aleatory variables. It was found 
that their infinite expansions are divergent for some of the values of the random 
variables involved, however large the number of observations (N) may be. Yet 
it could be demonstrated, that the first n terms of these series will give an ap- 
proximation, as close as is wanted, if N is sufficiently large. For practical pur- 
poses the case n = 1 is the most important. 
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In a recent paper [1] Bartlett has written a further justification of his criticism 
of the test of significance for the difference between means of two samples from 
normal populations not supposedly of equal or related variance. This test was 
originally put forward by W. V. Behrens [2], and later [3] found to be very 
simply derivable by the method of fiducial probability. 

It is unfortunate that Bartlett did not restate his own views on this topic 
without making misleading allusions to mine. Thus, on p. 135 in [1]: 


“It is sufficient to note that the distribution certainly provides us with an exact inference of 
fiducial type, as Fisher himself confirmed [9], p. 375.”’ 


I do now know, and Bartlett does not specify, what unguarded statement of 
mine could be used to justify this assertion. From the time I first introduced 
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the word, I have used the term fiducial probability rather strictly, in accordance 
with the basic ideas of the theory of estimation. Several other writers have 
preferred to use it in a wider application, without the restrictions which I think 
are appropriate. To all, I imagine, it implies at least a valid test of significance 
expressible in terms of an unknown parameter, and capable of distinguishing, 
therefore, those values for which the test is significant, from those for which 
it is not. 

Shortly after Bartlett’s alternative approach to the problem was put forward 
[4], I expressed [5] the following opinion. As this occurs prominently in the 
summary, indeed on the very page to which Bartlett refers in his quotation 
above, I cannot suppose he has overlooked it, though evidently he must have 
missed its meaning. I wrote as follows [5] p. 375: 

“The criticism of Behrens’ test of significance, recently put forward by Bartlett, on the 
ground that it differs from a possible alternative test, cverlooks the inconsistency of assum- 
ing for the unknown variances both (a) fiducial distributions in accordance with the samples 
observed, and (b) values fixed from sample to sample. 

The alternative test of significance proposed involves, when the variance ratio of the two 
populations sampled is unknown, the choice by lot between the value 7’, used in Behrens’ 
test, and a second value 7’, which reverses the order of significance of different possible 


sets of observations. High values of 7’ are not, therefore, by themselves evidence of 
inequality of the means.”’ 


I submit that the second paragraph quoted above shows, without further 
argument, that I rejected Bartlett’s proposed test of significance, and therefore 
that I did not confirm his opinion that it provided “‘an exact inference of fiducial 
type.”’ Whether my reasons for doing so were strong or weak is, of course, 
another matter. 

What may have led Bartlett to adopt his test of significance is its formal 
similarity to one appropriate to a different problem. In 1908 “Student” in his 
now celebrated paper on ‘“The probable error of a mean’’ [6] applied his solution 
to what are known as paired observations. Two treatments A and B are 
applied each to one of a number of pairs of plots, or other experimental units, 
the members of each pair being chosen to be in other respects closely comparable, 
although the circumstances of the different pairs are not necessarily closely alike. 
In order to allow for any, possibly large, variations in the conditions prevailing 
in the different pairs, attention is confined to the difference, having regard to 
sign, supplied by each pair. 

Thus, if pairs of measurements a, , b; , dz, be, --- are obtained, we may write 


d, = a, — bk, 


and test the hypothesis that the differences d are a normal sample having zero 
mean. This hypothesis will be true if a, and b, are distributed, by experimental 
error, in normal distributions having the same mean, even though this mean is 
not the same for different pairs. It will be true if the variances of a and b from the 
hypothetical mean of the pair are unequal, provided these variances are the same 
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from pair to pair. These are the reasons which make the hypothesis that d is 
normally distributed about zero appropriate for testing the differential effect of 
the treatments. 

If only two pairs are used, “Student’s” test reduces to 


a di a ds 


oa dy — 4 


There is one degree of freedom, so that ¢ is distributed in Cauchy’s distribution 


iia 1 dt 
Ur Site 

If, now, the symbols have a different meaning, so that a; and a2 are a sample of 
two from a single normal distribution, and b; and be a second sample from a 
different population, having by hypothesis an independent variance, Behrens’ 
problem (limited for comparison with Bartlett to samples of 2) is to test whether 
the two populations can be regarded as having the same mean, or whether there 
is reason to regard the means also as being different. Note that the pairs 1 and 2 
are not supposed to differ in treatment or situation. The difference a; — ae 
is not to be ascribed partly to differences between the hypothetical means of 
these pairs, but wholly to the error variance of the observations a, about which 
it is the only source of information; the like is true of the difference b; — be. 
The sign of these two differences is arbitrary, only their positive values concern 
our problem. There is no real correspondence between the suffices assigned to 
the two pairs of letters. They could be interchanged for a, and not for b, without 
affecting the problem. 
Behrens’ test reduces for this case to taking 


a, + a2 a bi — be 
| ay _ As | + | by _ be |’ 

























T= 


using for the probability function, ‘“Student’s” distribution for one degree of 









freedom. Bartlett’s test involves choosing at random between 7 and T’, where 
l= Atue—bh—~b 

la; — a2| — |b, — bell 

|| Qy ae ij ¥1 2 || 
It will be noticed that, if |b; — be| < | a, — ae|, and if, keeping b; + be 
constant, |b; — be | is increased, a change which must make us suspect larger 
errors, and therefore a lower significance, the value of the difference | a; — az | — 
b; — be|is continuously diminished, and that of 7’ continuously increased, 


without limit. In fact, the probability of exceeding any limit of significance, 
however high, may be made to exceed 50% by this process. The order of 
significance of such a series of possible observations is thus reversed. The 
fact that choosing at random T and T’ will give us a quantity which, on the null 
hypothesis, is distributed in ‘“‘Student’s’” distribution is, thus, insufficient to 
justify its use as a test of significance. 
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It is also irrelevant, and this may be at the present time the most important 
point to make, that the sampling distribution of 7 above is not given by “Stu- 
dent’s’’ distribution, if the populations to which statements of probability refer 
is supposed to consist of samples taken repeatedly from populations having a 
fixed variance ratio. Such a supposition, as I noted in the passage quoted above, 
is inconsistent with the fiducial distributions derived from the samples. Bartlett 
comes near to discussing this point on p. 136 in [1]. He says: 


‘‘While Fisher suggests that this in no way invalidates his fiducial argument, in my view 
if an inference is to be independent of an unknown parameter, it should in particular be 
independent of it if we imagine that we are being supplied with pairs of samples, for all of 
which the ratio has the same value.”’ 


In its natural meaning this statement seems to be true. The problem concerns 
what inferences are legitimate from a unique pair of samples, which supply the 
data, in the light of the suppositions we entertain about their origin; the legiti- 
macy of such inferences cannot be affected by any supposition as to the origin 
of other samples which do not appear in the data. Such a population of samples 
is really extraneous to the discussion. Nor has Bartlett shown that Behrens’ 
inference from a unique pair of samples is so affected. What he seems to rely on 
is that an aggregate of samples fulfilling the null hypothesis, but drawn from 
pairs of populations having a fixed variance ratio, will show differences between 
their means exceeding the limits fixed by the test for significance, with a fre- 
quency other than that indicated by the test. This, however, is a circumstance 
common to all the well known tests of significance, and has been obvious from 
their very origin. 

In “Student’s”’ test for significance, for example, if a sample of n’ observations 
are taken from a population normally distributed about zero, we calculate 


z = + s(2), n=n' —1, s" = 1 g(x - 3) 
n n 


and count & as significant, if 
=> str/Vn' 


where ¢, is “Student’s” test for n degrees of freedom, corresponding to the 
level of significance chosen. 

However, in repeated samples of n’ from a population having a given variance 
o , it is highly improbable that Z-will exceed the limit assigned with the frequency 
chosen. The limit it will exceed with this frequency is 


ot, /~/n' 


which will usually differ from that assigned from the sample. This, however, 
has not hitherto been considered an adequate reason for calling the test inac- 
curate, or biased. It is merely a recognition of the fact that, if we did know a, 
we could make a better test. Just as, in Behrens’ problem, if we knew the 
relative weights x of the observations in the two samples, we could make a 
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weighted “‘Student’s” test, and should be wise to do so—if the information were 
available. 

Naturally, it may be said that although the limit of significance assigned to z 
will not be verified in repeated sampling from populations having the same 
variance, the distribution of ¢ will be so verified. In this respect the distribution 
of tin ‘‘Student’s”’ case is analogous to the simultaneous distribution of t,; and ts 
in Behrens’ case, where 
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t = , ga t—, 
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u is the hypothetical common mean of the two populations, and sj, and s3 are 
the estimated variances of the means of the two samples. The quantity d 
which Sukhatmé [7] has conveniently tabulated, in such a way that 


dV si + 83 








supplies a significance limit for %,; — Z: , naturally does not possess the property 






























that 
i — >dVsi+s 
with the probability assigned, in a population consisting of pairs of samples from 
populations having the same variance ratio. 
é' If the populations were fixed, the corresponding limit would be 
; teV oi + 03, 
and if the variance ratio were fixed so that w is the weight of Z- relative to that of 
8 ¥, , it would be 
i j= + wnes?) (1 + =) 
‘nyrne tl ite ee 
m + Ne 
provided always, if we wish to express ourselves in terms of repeated sampling, 
that the absolute values of o,, or o, were fiducially distributed. Behrens’ 
problem refers to the case in which neither the variances nor their ratio is 
e known, so that the unknown variances, independently, must be given their 
fiducial distributions. 
e In this note I have not touched on the logical background of Behrens’ test, 
y or the practical conditions on which it is appropriate, since I have recently 
discussed these more fully [8]. Recently also [9] Yates has given a careful 
explanation of the basis of the test. 
: SUMMARY 
? 
n The statement of Bartlett that the author (Fisher) has confirmed that Bart- 
5, lett’s approach to Behrens’ problem provides an exact inference of fiducial type 
e is incorrect. The only exact test appropriate to his problem.seems to be that 





given by Behrens. 
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A NOTE ON NEYMAN’S THEORY OF STATISTICAL ESTIMATION’ 


By Sotomon KULLBACK 


In this note we shall examine a section of a recent paper by Neyman’ dealing 
with statistical estimation. Consider the following quotation from that section” 
which deals with the statement of the problem: 

“Consider the variables [xz;, x2, --- , 2] and assume that the form of the 
probability law [p(a1, --- ,2n| 01, 02, --- , @:)] is known, that it involves the 
parameters 6; , 02, --- , 8; which are constant (not random variables), and that 
the numerical values of these parameters are unknown. It is desired to estimate 
one of these parameters, say 6,. By this I shall mean that it is desired to define 
two functions 6(Z) and 6(£) < 6(E), determined and single valued at any point 
E of the sample space, such that if HE’ is the sample point determined by observa- 
tion, we can (1) calculate the corresponding values of 6(E£’) and 6(E’) and (2) 
state that the true value of 6; , say 6) , 1s contained within the limits 


9(E’) < 6; < 6(E’) (18) 


this statement having some intelligible justification on the ground of the theory 
of probability. 


1 Specifically we refer to J. Neyman ‘‘Outline of a Theory of Statistical Estimation Based 
on the Classical Theory of Probability,’’ Phil. Trans. Roy. Soc., vol. A236 (1937), pp. 
333-380. 

2 J. Neyman, loc. cit., p. 347. The material in brackets are slight alterations of the 
original text in order that the quotation do not refer to previous matter in the original 
paper. 





NEYMAN’S THEORY OF STATISTICAL ESTIMATION 389 


This point requires to be made more precise. Following the routine of thought 
established under the influence of the Bayes Theorem, we could ask that, given 
the sample point E’, the probability of 6), falling within the limits (18) should be 
large, say a = 0.99, etc. If we expressed this condition by the formula 


P{0(E’) < 6; < 6(E’)| E’'} =a (19) 


we see at once that it contradicts the assumption that 6) is constant. In fact, 
on this assumption, whatever the fixed point EZ’ and the values 6(E’) and 6(E’), 
the only values the probability (19) may possess are zero and unity. For this 
reason we shall drop the specification of the problem as given by the condition 
(19).”’ 

We believe that the folloving approach to the problem, emphasizes to a greater 
extent the fact that if the practical statistician follows the steps recommended 
as a result of Neyman’s solution, then ‘in the long run he will be correct in about 
100a percent of all cases’. 

Let us return again to the condition (19) of the quotation, and write 


(1) w(E) = P{0(E) < 6 < 6(E) | EF} 


es ‘ . 0 
where of course (#) = zero or unity according as the true value of 6,, say 6; 
does not or does satisfy the inequality 


(2) O(E) < 6 < OB) 


We may however calculate the average value of x(E) i.e., the percentage of cases 
in which in the long run the statistician will be correct.” In accordance with 
the definition of an average 


(3) a(E) ia | a(E)p(E oe 62, oreo ) dx, dx2 '** dx, 
R 


where the region R is the entire sample space. If we let R, be that portion of the 
sample space for which (2) is satisfied, then since 7(#) = 1 if EF falls in R, and 
zero otherwise 


(4) n(E) = / p(E|6:, 02, «++, 0) daidxe --+ dz, 
Ri 
Thus, if we want our rule to lead to a correct statement in 100a@ percent of cases 


in the long run, we must look for two functions 6(Z£) and 6(E£) such that for the 
corresponding region R, 


a(E) = | P(E | 6, 02, +++ , 0) dadr2 +++ dx, = a 
Ri 


holds good whatever the value 6; of 6; and whatever the values of the other 
parameters 62, 63, --- , 8, involved in the probability law of the X’s may be. 


3 Cf. A. Werteimer, ‘‘A Note on Confidence Intervals and Inverse Probability,’’ Annals 
Math. Statistics, Vol. X (1939), pp. 74ff. 
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If we apply to the preceding the calculus of probability in accordance with 
Neyman," we find that (5) may be written as 


(6) w(E) = P{@(E) < 01 < 6(E)| 61} = a 


which, with the conditions stated for (5) is identical with formula (20) on page 
348 of Neyman’s paper. 


THE GEORGE WASHINGTON UNIVERSITY 
WasHINGTON, D. C. 


4 J. Neyman loc. cit. pp. 333-343. 


A NOTE ON A PRIORI INFORMATION 


By C. EIsENHART 


A survey of recent literature on mathematical statistics is sufficient to reveal 
the fact that in approaching certain types of problems some writers assume 
more information known a priori than do other writers. Indeed, it soon becomes 
evident that great care is necessary in wording (and in reading) propositions in 
mathematical statistics. Furthermore, propositions which are true and power- 
ful when certain information is known a priort may become either useless or 
irrelevant according as more, or less, information is available a priori. Once 
this situation is appreciated some apparent contradictions are resolved, and 
certain exceptional examples can “be reasonably regarded as bearing out the 
principle to which formally they are anomalous.” 

So far as I know it was Bartlett [1, p. 271] who first clearly pointed out how a 
slight change in the amount of information known a priori can greatly alter 
the complexion of a problem. He was indebted to Neyman and Pearson 
[5, p. 122] for his problem, which was to develop a test of the statistical hypothe- 
sis, Hy , that 8 = Bo and y = yo for a random sample from the distribution 


B —B(z—v) 
(1) tee 
{0 


forz > ¥ 


forz < y¥. 


If (1) expresses all the information (about the distribution of xz) that is to be 
considered as known a priori, any value of 8 > O and any finite value of y 
being admissible, then it follows’immediately from a result of R. A. Fisher’s 
[2, p. 295] that no uniformly most powerful test, in the sense of Neyman and 
Pearson [4; 5, p. 115], can exist for Ho, since Ho involves the simultaneous 
testing of two unrelated parameters. 


1 Since Fisher’s wording is important it will be well to quote him here: ‘‘It is evident, 
at once, that such a system [of maximum likelihood relations needed to insure the existence 
of a uniformly most powerful test] is only possible when the class of hypotheses considered 
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By assuming that in addition to (1) the a priori information includes the 
knowledge that 8 > fp and y < Yo constitute the only admissible ranges of 
values of these parameters, Neyman and Pearson [5, p. 122] have succeeded in 
showing that a uniformly most powerful test of Ho does exist when the ad- 
missible values of 6 and y are restricted in this way. At first this appears to be 
in contradiction to Fisher’s statement referred to above, but Bartlett [1, p. 271] 
points out that the restrictions on the admissible values of 6 and y reduce the 
problem effectively to one of testing a single parameter: In the first place, no 
statistical test is necessary if an observation less than yo occurs, since this 
refutes the hypothesis Hy immediately. Therefore, a statistical test of Ho is 
needed only when none of the observations are less than yo , and for such observa- 
tions the distribution law is 


- pc) = fe er “= 





er. x> v0, 





and is independent of y. In consequence, the test reduces to testing the single 
parameter 6 in (2), for which the arithmetic mean, @, is a sufficient statistic. 
The discovery of a uniformly most powerful test of Ho , when the above restric- 
tions are placed on the admissible values of 6 and y, is, therefore, reasonably 
consistent with the full meaning of Fisher’s statement. 

The preceding example makes quite clear how a little additional a priori 
knowledge can affect the solution of a problem in mathematical statistics. 
The a priort knowledge employed by writers in mathematical statistics usually 
falls into one of the following categories: 

(i) The elementary probability law is taken to be continuous or discrete, 
as the case may be, but its mathematical form is left unspecified. 

(ii) The elementary probability law is taken to be of a definite mathemati- 
cal form involving one or more parameters the value(s) of which is (are) not 
considered as known a priori, and any value(s) of this (these) parameter(s) 
consistent with the non-negative character of a probability law is (are) 
admissible. 

(iii) Here the information assumed known is as in (ii) except that the 
admissible values of the parameter(s) form (a) restricted sub-set (or sub-sets) 
of the values admissible in (ii), such subsets, however, being comprised of 
more than a single value. 

(iv) The information is so complete that the admissible values of the 
parameter(s) have (a) known a priori probability distribution(s)—if a param- 


involves only a single parameter 6, or, what comes to the same thing, when all the param- 
eters entering into the specification of the population are definite functions of one of 
their number. In this case, the regions defined by the uniformly most powerful test of 
significance are those defined by the estimate of the maximum likelihood, T. For the test 
to be uniformly most powerful, moreover, these regions must be independent of 6, showing 
that the statistic must be of the special type distinguished as sufficient.’’ (Words in 
square brackets are mine.—C. E.) 
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eter @ is known to have a definite value 6’, then the a priori probability law 

of 6 can be taken as (Prob. 6 equals 6’) = 1, (Prob. @ not equal to 6’) = 0.° 
As statistical theory advances it may become necessary to classify problems 
according to the amount of information which may be assumed known a priori, 
when proceeding to their solution. No claim is made here that the above 
categories are the best to choose, but it may prove fruitful to study the extent to 
which results obtained with a certain amount of information assumed known are 
useful when more, less, or perhaps different, information is taken as known 
a priori. In particular, as the preceding example shows, it may be well to 
investigate exactly what are the implications of restricting the ranges of the 
admissible values of parameters. 

It is unwise to attempt to predict the outcome of such research at this time, 
but it is probably safe to say that an increase in a priori information will gen- 
erally render possible better tests of significance—better in the sense that, for a 
given probability of rejecting the hypothesis tested when true, the probability of 
rejecting it when false will be greater—and narrower confidence intervals for a 
given confidence level. The example already given, concerned with a test of 
significance, supports this conjecture. As a further example, from the point of 
view of estimation, we may recall that it is possible with a level of confidence 
equal to .96875 to assert [3, p. 4] that the true median of the population from 
which a random sample of 6 was drawn lies within the observed range of the 
sample, and this without any assumption about the population except that it is 
continuous. If, however, the population is known to be of norma! form with 
unknown mean, m, and standard deviation, ¢, then Student’s ¢ will provide the 
narrowest confidence intervals for the median of the population, since ¢ provides 
[6, p. 378] the best available confidence intervals for the mean, m, (which isalso 
the median) of a normal population when ¢ is unknown—if the population is 
normal and o is known, then the normal deviate (¢ — m)+/6/e will supply still 
narrower confidence limits for m. 

In conclusion, the circumstances under which it may be desired to apply 
methods of statistical inference may differ considerably in the amount of knowl- 
edge available to the research worker a priori, and the most efficient tests of 
significance and methods of estimation applicable to a given case will depend 
upon the nature of the available information as described in the above classifica- 
tion. In comparing the procedures of different writers, therfore, it is most 
important to examine their premises and see how much information each is 
considering as known at the start. 
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A NOTE ON COMPUTATION FOR ANALYSIS OF VARIANCE 


By Morris C. BisHop 


The method of computation for analysis of variance commonly favored is one 
which involves obtaining the total and total sum of squares in a single operation 
on a computing or card-punch machine,’ in which case a check on the accuracy 
of the work requires complete recomputation. But the best tools available 
to the student, and sometimes to the experimenter, are a table of squares and 
perhaps a listing machine. In such a situation, a simple algorithm which 
embodies checks on the computations is urgently needed. The method here 
presented reduces the arithmetic to repeated application of a single procedure, 
with adequate checks; it reveals rather than obscures the sample variances, 
which may or may not be of primary importance; and it provides an intuitively 
logical portrayal of the step-by-step improvement of the estimate of population 
variance. 

The data items and their squares may be merged into a single table by setting 
them down in staggered fashion, as shown in Table I. If only a single criterion 
of classification is to be used—classified into columns, say—the columns are 
summed down, and then these totals across (obtained as two sets of subtotals 
and totals on a listing machine). This yields the grand total (7) and total 

N c 
sum of squares (> 7 xh), Summing across and down verifies the addition 
i=l j=1 
and provides material for two-way classification. The total sum of squares of 
deviations is obtained by the familiar formula 


N k > 


c N k 72 
(1) [7 o.~ «Ewes 


i=l j=l i=l jook Nk 


where Nk is the total number of observations in N rows and k columns. 


1See George W. Snedecor, Analysis of Variance and Covariance, and Paui R. Rider, 
Modern Statistical Methods. 
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To obtain the sum of squares of deviations within columns, each data column 


total (7’. ;) is squared and divided by the number of items in the column. Then 
2 


. Tai 
This correction (7) is subtracted from the sum of squares for the column 
i 


and the differences are summed across, the formula for this summation being 


k Nj k Nj 2 
(2) » > (xX, -X)? => (= Xii - it) 
j=1 i=l j=1 \i=1 N; 
This procedure is unvarying, whether the classes contain the same number of 
observations or not. The sample variance for each column is immediately ob- 
tainable, if desired, by dividing by the appropriate degrees of freedom (N; — 1). 
Short-cut machine methods completely obscure these sums of squares of devia- 
tions for the individual classes. 
The sum of squares of deviations between classes is obtained by adding 
across the correction terms just used and subtracting from this total the correc- 
tion term of Equation (1). That is, 


k ot —_ k 2 T? 
3 NX; — X)* = 7 — —_ 
“Within classes” plus “Between classes” will, of course, check against the 
“Total’’. 


If the classification is into rows, the appropriate formulas are: 
Within rows 


7j=1 ky 


i=] j=1 i=1 


N ki ite N ky 7 
(2a) DD (X4 — &:)? = DL ( Xi; — ). 


Between rows 


(3a) 2 k(X.. — X= D ik ~ NE 

When a single criterion of classification is involved, this procedure applies, 
whether or not the classes contain the same number of observations. Double 
classification with unequal frequencies is somewhat more difficult,” and we shall 
consider here only the case of equal numbers of observations in the rows and 
in the columns, respectively—a rectangular array not necessarily square.’ For 
the two-way analysis, both procedures outlined above are carried out, and then 


T? 


*See F. Yates, ‘“The analysis of multiple classifications with unequal numbers in the 
different classes,’’ Journal of the American Statistical Association, Vol. 29 (1934); and 
A. E. Brandt, ‘‘The analysis of variance in a 2Xs table with disproportionate fre- 
quencies,’’ Journal of the American Statistical Association, Vol. 28 (1933). 

3For a simple treatment in three-way classification of unequal but proportional repre- 
sentation between subclasses and between classes, see G. W. Snedecor, Statistical Methods, 


p. 233-35; also an interesting example in F. C. Mills, Statistical Methods (1938 ed.), Ap- 
pendix E. 
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the interaction may be obtained in any one of three ways. The usual way is to 
regard it as 

“Total”, minus “Between rows’ 
But it may also be considered as 

“Within columns’’, minus “Between rows”’ 
or 

“Within rows’ 


’ 


, minus “Between columns’’. 


’, minus “‘Between columns’’. 


Either of the latter is perhaps more meaningful than the first, because this 
procedure presents the interaction, or ‘error’ sum of squares, as the logical 
third step in an effort to obtain the best estimate of population variance, through 
the successive stages of sample, or class, variances; their average; and, finally, 
this average freed of the effect of variability attributable to a second type of 
classification. 

Justification for this way of regarding the interaction is easily established if 
we look for a moment at the familiar fundamental identity for analysis of 
variance: 

k N &k 


N k . 
dD LL (Xi — X) =N Ze (X,-— XP + D0 D(X, - X.,)’ 


i=1 j=1 j=1 i=l j=1 


k N 
NY (%, — B+ kD (Kk. — X) 
7=1 


i=1 


it, N 
7 2 2 Xe — F) -— kD K - xy 


i=1 j=1 i=1 
Or, similarly, 
v ok 


N k N N C 
D 1 (Xi — X) =k D(X. — 8)? + DD (Xa — &)” 


i=1 j=1 t=1 i=1 j=1 


N k 
=k (kX. — XP +N U(X; - &) 
i=l j=l 
N k k 
+ ‘z Zz Xi; a X;.)° os ND, a xy}. 


i=1 j=1 


In the computation notation adopted (dropping the subscripts from N and k 
since class frequencies are equal), the interaction is either 


k N . 7. N 7 T? 
> (2 x5 - 9) - (2 = we) 


or 

N k 2 k 72 p N ¢ N k 72 ry2 
2 T;. ee - 2 i. Seay 3 

2X (> Os ) (oF a ~ St} Er » NT Nk 


the elements of which already have been computed. 
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Table II demonstrates the procedure applied to a simple example involving 


four rows and four columns, the final computations being: 
Total = 2686 — 2450.25 = 235.75 

Between columns = 2461.50 — 2450.25 = 11.25 
Between rows = 2463 — 2450.25 12.75 
Interaction = 224.5 — 12.75 = 211.75, or 223 — 11.25 


211.75 


TABLE II 


\} 

| Row | 

totals | 

sdeMeaal 
| 46 | 

256 | 

| 52 | 

121 81 || 


54 


400 81 | 


564 — 


| 722 — 676) 46 


| Sum of 
|sq.dev. 
| within 
| rows 


529; 35 


854 — 729) 125 


| 13 | 14 


169 | 
46 


100 


Column totals. | 4§ 


558 | 
529 


685 
600.25 
Sum of sq. dev. 
within cols.. 


84.75 


29 


The analysis of variance table is as follows: 


Sum of squares 
of deviations 


235.75 
224.50 
11.25 3 
223 .00 12 
12.75 3 
211.75 9 


Degrees of 
freedom 


15 
12 


Total 

pC ee ee 
Between columns................... 
se cs en hw ad wna S a 
6 sn Le tou acs dk wom uit 
Interaction 


529) 17 


2686 — 2463! 223° 


Mean square 
deviation 


18. 
3. 
18. 
4. 
23 . 5: 


Extension to more complex experimental designs involving more than two 
criteria of classification requires merely to observe that the interaction becomes 
a quantity of the order of Within A-type classes minus Between B classes minus 
Between C classes, as well as Total minus Between A classes minus Between B 


classes minus Between C classes. 


If, for example, the above illustration had 
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been designed to embody a ‘“‘varieties’’ classification represented by the Latin 
square 


A B D 
the additional computation would be as follows: 


16 256 10 100 12 144 

18 324 14 196 11 121 

20 400 9 8] 18 324 

13 f 10 100 14 196 

67 ¢ 43 477 55 785 198 2686 
1122.25 462.25 756.25 = 2613.00 


26.75 14.75 28.75 = 73.00 

Within varieties = 2.75 + 26.75 + 14.75 + 28.75 = 73.00 
Between varieties = 272.25 + 1122.25 + 462.25 + 756.25 — 2450.25 = 162.75 

Most complex experiments will not be concerned with the ‘within classes” 
sum of squares of deviations for all the criteria involved, but if this sum has been 
computed for two or more criteria, a check on the later stages of the work is had 
by observing the alternative ways of computing the interaction, which in this 
case are: 


235.75 3. 224.50 223 .00 

—11.25 ; —12.75 —11.25 

—12.75 0 — 162.75 — 162.75 

— 162.75 

49 .00 49.00 49.00 49.00 

For the three-way classification of the Latin square the analysis of variance 
table is: 
Sum of squares Degrees of Mean square 
of deviations freedom deviation 

235.75 15 
Between varieties............. 162.75 3 
Within varieties................ 12 
Between columns........... 11.25 3 
Between rows.............. 12.75 3 
49.00 6 


or 

wn -> 

“I © bd 
oo 


— bd 
~I ot Oo 


e 


CO > 


The procedure outlined, as a uniform method of computation in analysis 
involving a single criterion of classification, with or without equal numbers in 
the classes, or involving multiple criteria, has several things to recommend it. 
An important consideration to the worker who is not primarily a statistician 1s 
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that he or an assistant with little training can perform the mechanical work with 
confidence. Also, the ‘“‘within classes’? sum of squares is computed directly 


and not as a difference. The sums of squared deviations leading to the sample 
variances are exhibited in explicit form for inspection, and test of significance if 
desired. Herein frequently is found an important clue, a warning signal or a 
hint leading to re-examination of the sampling procedure. 

As a final point, it is worthy of notice that the method facilitates use of 
analysis of variance as atechnique of preliminary investigation. If theobserved 
data have been obtained more or less fortuitously, and not as the result of rigid 
experimental design, rows or columns may be eliminated or combined with a 
minimum of labor, thus permitting testing of various combinations of data. 

THE GEORGE WASHINGTON UNIVERSITY, 
WasnHinocton, D.C. 


ANNOUNCEMENT CONCERNING COMPUTATION OF 
MATHEMATICAL TABLES 


A Project for the Computation of Mathematical Tables, sponsored by Dr. 
Lyman J. Briggs, Director of the National Bureau of Standards, is being con- 
ducted by the Work Projects Administration for the City of New York. The 
Project has been in operation since January 1, 1938, under the technical super- 
vision of Dr. Arnold N. Lowan. 

An agenda of the Project, listing tables completed, in progress and under con- 
sideration is given below: 


COMPLETED TABLES 


1. A table of exponentials for the following ranges, intervals and number of 
decimals. 


Range Interval No. of Decimals 
—2.5000 to 1.0000 0.0001 18 
1.0000 to 2.5000 0.0001 15 
2.500 to 5.000 0.001 15 
5.00 to 10.00 0.01 12 


2. A table of sines and cosines for the range from 0 to 25 radians at intervals 
of 10° to 8 places of decimals. 
3. A table of the first 10 powers of the integers from 1 to 1,000. 


TABLES IN PROGRESS 


(A) Computations completed, manuscripts in process of prevaration. 
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1. A table of the functions 


Si(z) = [ edt Ci(x) = [ oe at; 


2 ot o —t 
Ei(zx) [ ; dt; — ki(—2) / — dt, 
for the range between 0 and 2 at intervals of 10°“, 9 places of decimals. 
. A table of the functions defined under (1) for the range between 0 
and 10 at intervals of 10 ° to nine significant figures. 
3. A table of circular and hyperbolic sines and cosines for the range 
between 0 and 2 at intervals of 10. 
. A table of natural logarithms of integers from 0 to 100,000 to 16 
places of decimals. 
. A table of natural logarithms of decimal numbers from 0.0000 to 
10.0000 at intervals of 10 * to 16 places of decimals. 
. A series of Physical Tables 


1 ae tot 
(a) G= V1 _ BY for B ranging from 0 to 0.9997 at various inter- 


vals. This table also includes certain functions depending on G. 


(b) Table of Ny = 2nc 


xe , for X ranging between .25 and 10 


ar — | 
microns at various intervals, and for 7 = 1000, 1500, 2000, 2500, 
3000, 3500, 6000°K. 


(ec) Table of J, = dl = , for T = 1000°K, \ ranging from .5 to 


ext — | 
20 microns at various intervals. 
ny X 
(d) Table of No, = [ N, dy, and Jo. = [ Jydy for T = 
0 0 
1000°K. Range of A, the same as for J) . 
(e) Table of the ratios 
Jor No-»r Jy Ny 
; ——y ——, and —— , 
Jona N 0—2x (Fa )unn (Ny) max 
Range for A, same as under (c). 
(B) Computations in progress. ’ 
7. Table of the Probability Functions 
2 - Z [ —t2 
- € and - fe dt 
VT Vr Jo 


for x ranging from 0 to 1 at intervals of 10“ and from 1 to 6 at inter- 
vals of 10°. 
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. Table of the Probability Functions— 


a es ¢ 
/? e ? and Vf e dt 
T FT 40 


for xz ranging between 0 and 1 at intervals of 10‘ and from 1 to 8.4 
at intervals of 10°. 


. Table of Bessel Functions Jo(z) and J,(z) for complex arguments, 
z = re‘ where r ranges from 0 to 10 at intervals of 0.01 and @ ranges 
from 0° to 90° at intervals of 5°. 


. Table of tan x and cot x for x ranging between 0 and 2 at intervals 
of 10“. 


1 1 
. Table of the integrals [ z* sin nex dx and | z* cos nz dx for 
0 0 


n = 0,1, 2,3,--- 100 andk = 0,1, 2,3, 4, 5. 


TABLES UNDER CONSIDERATION 


. A 12 decimal place table of inverse tangents to radian measure. The 
table will include the following ranges: From 0 to 3 at intervals of 10°, 
from 3 to 10 at intervals of 10~°, from 10 to 40 at intervals of 10’, from 40 
to 100 at intervals of 1, and from 100 to 1000 at intervals of 10. 

. Table of Bessel Functions J41/3(z), J+2/3(z), Jaiya(z), Yo(z), Yi(z) and 
K 41;3(z), for the range and for intervals similar to those for Jo(z) and J;(z). 


. Table of Q,(x) = ee (x) for x ranging between 0 and 10 at intervals 
2x , 


of 0.01 and for n = 1, 2, 3, --- 10. 

. Table of Gamma Functions for complex arguments a + 672, where a and b 
range from 0 to 5 at intervals of 0.05. 

. Table of Elliptic Functions for arguments z = x + iy, where x and y 
range from 0 to 37 at intervals of 0.01. 


— 
. Table of the function A (a, y) = 7 = where z ranges from 0 to 1 at inter- 


vals of 0.01 and y ranges from 0 to 4 at intervals of 0.01. 
7. Table of Temperature and Density of Stars, for “Point-Source’”’ models. 

The directors of the Project are anxious to hear from scientific colleagues 
concerning work in progress elsewhere as well as concerning suggestions for new 
tables. In particular, they would appreciate suggestions concerning new tables 
which may be of interest in the field of statistics. All suggestions will receive 
careful consideration. 

Communications concerning new tables or work in progress elsewhere should 
be addressed to Dr. Arnold N. Lowan, Chief Project Supervisor, Project for the 
Computation of Mathematical Tables, 475 Tenth Avenue, New York. 








THE INSTITUTE OF MATHEMATICAL STATISTICS 
(Organized September 12, 1935) 


OFFICERS FOR 1939 
President: 
P. R. Riper, Washington University, St. Louis, Mo. 


Vice-Presidents: 
C. C. Craic, University of Michigan, Ann Arbor, Michigan. 
S. S. Wiixs, Princeton University, Princeton, N. J. 


Secretary-Treasurer: 
A. T. Crate, University of Iowa, Iowa City, Iowa. 


The purpose of the Institute of Mathematical Statistics is to stimulate 
research in the mathematical theory of statistics and to promote coéperation 


between the field of pure research and the fields of application. 


Membership dues including subscription to the ANNALS OF MATHEMATICAL 
Sratistics are $5.00 per year. The dues and inquiries regarding member- 
ship in the Institute should be sent to the Secretary-Treasurer of the 
Institute. 





